- Piercing through the Clouds of Venus with Arecibo Radar17 Aug, 2022
- American Astronomical Society’s 240th Meeting: Plenary Lecture Building the Future of Radio Science with the Arecibo Observatory by Dr. Héctor Arce. 28 Jul, 2022
- TRENDS 202227 Jul, 2022
- Advancing IDEA in Planetary Science 27 Jul, 2022
- The Arecibo Observatory: An Engine for Science and Scientists in Puerto Rico and Beyond27 Jul, 2022
- Cryogenic Frontend work for the 12m telescope entering phase II21 Jul, 2022
- A Parkes “Murriyang” Search for Pulsars and Fast Transients in the Large Magellanic Cloud 11 Jul, 2022
- A Comparison of Multiphase Magnetic Field Tracers in a High Galactic Latitude Region of the Filamentary Interstellar Medium 11 Jul, 2022
- The First Observation of Additional Ionospheric Layers Over Arecibo Using an Incoherent Scatter Radar11 Jul, 2022
- Decoding the star forming properties of gas-rich galaxy pairs11 Jul, 2022
- Crater Ejecta Across Maxwell Montes, Venus, and Possible Effects on Future Rock Type Measurements 11 Jul, 2022
- On Single-pulse Energies of Some Bright Pulsars Observed at 1.7 GHz11 Jul, 2022
- Probing the Local Interstellar Medium with Scintillometry of the Bright Pulsar B1133 + 16 11 Jul, 2022
- Arecibo Celebrates National Engineers Week 06 Apr, 2022
- The Arecibo Observatory at the Upcoming 240th American Astronomical Society Meeting06 Apr, 2022
- The Arecibo Observatory Survey Salvage Committee Report06 Apr, 2022
The Big Data Program: Arecibo Observatory Data Archive
Byadmin19 July 2021 #AOScienceNow

#AOScienceNow |
Through the Big Data Program at the Arecibo Observatory (AO), we are developing the Arecibo Archives Data Catalog to facilitate the access to AO's projects, observations, datasets, and attributes. Approximately half of the AO database is currently available in the catalog: https://www.naic.edu/datacatalog/
The purpose of the Data Catalog is to provide a user-friendly portal where users can browse, query, and explore the projects observed at Arecibo for more than 55 years. This catalog consolidates multiple data sources that have been built throughout AO's operation. The main component of the Data Catalog is the Projects Catalog, which provides all of the technical information about a proposal or project. This is essentially what the scientists would submit as a proposal to receive Arecibo observing time. The Data Catalog is complemented by the Observations Log, a Files Catalog and an Attributes Catalog. The Observations Log provides a detailed log recorded by the observing scientists for each project. The Files and Attributes catalogs contain all of the raw data files that were captured in the observations as well as key metadata of those files.
To build this catalog, the Big Data team worked to first identify and catalog all of the projects that have been done at Arecibo. This was no easy task since the data was stored in many formats throughout the years. For each format, the team created scripts that scraped or extracted all technical information from the documents and saved them into a database. This first step is the foundation of the Data Catalog.
In a similar way, the team extracted and compiled the Observations Log using log information that existed in different locations. Most of the observations were already saved in a database, making it easier to integrate into the catalog. The Files Catalog is being built as the datasets are copied to the Texas Advanced Computing Center. Once a dataset is copied, the team catalogs it and creates a record for it within the Catalog Database, keeping record of the file location, corresponding project, and size. Finally, the Attributes Catalog is being actively populated by extracting headers, metadata, and attributes from the raw files. This is being done using scripts that navigate through the server's paths and extracts the attributes from each file. This is catalogued and saved into a database that keeps record of all scientific attributes including related file name and project.
This catalog's importance is incalculable. It is the steppingstone to make Arecibo's Datasets accessible to the community and curious minds. The Data Catalog project is a computing strategy that will make the necessary data and resources widely available to the scientific community, continuing the Arecibo Observatory’s legacy of enabling groundbreaking new results about our atmosphere, our Solar System, and our universe.
Article written by Eng. Julio Alvarado Negrón
|
Big Data Manager |
Keywords: observatory, arecibo, data, big, data, catalog, texas, TACC, advanced, computing