At the end of the year 2020, we also witnessed a couple of firsts when it comes to the detection of biosignatures and technosignatures, which is quite unusual to have them both in the news within such a short period of time. First, a paper was published in the Nature Astronomy journal in September, asserting that the detection of phosphine in the atmosphere of Venus is due to either an unknown chemistry or to potential life. Second, a science “leak” showed up in the media even before the paper got published, regarding an anomalous radio signal potentially coming from the Proxima Centauri system (paper upcoming this month), a signal named BLC1 – Breakthrough Listen Candidate 1. Both stories have caused ongoing debates in science, which are unlikely to be resolved any time soon.
While neither are definitive answers for an actual detection of bisignatures or technosignatures – basically life -, what is very interesting to me is the fact that the debates, in both cases, lie in the data: both with respect to the collection and to the actual analysis of the data. In both cases, archival data has been used for analysis. In the case of the phosphine potential biosignature, the data was collected by the James Clerk Maxwell Telescope (JCMT) in Hawaii in 2017, and the Atacama Large Millimeter/submillimeter Array (ALMA) in Chile in 2019. In the case of BLC1, the data was collected by Parkes radio telescope in April 2019. Then, in the case of Venus, the dataset is quite small (a few GB) and can be downloaded and analyzed on a personal computer. In the case of BLC1, the dataset is currently unavailable, but, if we are to extrapolate from the usual radio observations, the data is probably around 1 TB per 5 min observation. In the case of Venus, the debate is not only about what is causing the phosphine – which is a chemistry problem, but also about how much phosphine has been actually detected - which is a data science problem. In the case of BLC1, the current analysis lies around isolating the anomaly in the data, checking for similar anomalies in other archived datasets, and making sure the origin of the data is not from Earth. Jason Wright has an excellent post on how to understand data transformation and analysis in radio astronomy here.
In both cases, the data analysis requires quite complex transformations and algorithms to separate the “signal” from the noise. But both cases, regardless of the specific field debates, are extremely interesting from the data science perspective. You can actually download the data and the analysis for Venus case from the original publication, and that will be the case for BLC1 also once the paper comes out
ADDENDUM – Tribute to Another Star That Rose to The Stars
I would like to also take this opportunity to pay tribute to someone who was a real inspiration for me, both professionally and personally, and unfortunately passed away on this 1st of January due to Covid related complications. She was my high school teacher of languages and literature, Stela Dumitroaia. Her first name, Stela, means “star” in Romanian. I kept in touch with her all this time since those teenage beautiful years. She taught me how powerful language and words can be and partly spurred my current affinity for natural language processing. She was more than a teacher. She used to take her students on hikes in the Carpathian mountains or on culturally inspired road trips around the country. She had a great grasp of the meaning of life and she taught us to live life fully, to go on adventures, to explore the world, the cultures, the arts of all countries, and to create beautiful memories before time steals them from us. I will dearly, dearly miss her. She was a spiritual mother to me and I am truly heartbroken. As we say in Romanian: Drum lin catre stele, doamna profesoara!