The ADS team was pleased to welcome our Users Group (ADSUG) to our home base here at the CfA for their second visit on 2-3 November. The ADSUG meets roughly once per year to hear about the status of ongoing and upcoming projects and to advise the ADS team on our operations and potential system enhancements. The group is comprised of members from the greater astronomy community, including scientists and librarians. Several of our members are also involved in digital archives or other computing-intensive projects. Their advice and insights are quite valuable in determining the direction of ADS development.
One of the ADSUG members is Erick Peirson, lead system architect at arXiv. Erick spent an extra day with us before the official ADSUG meeting to discuss our ongoing projects. ADS and arXiv have a long history of collaboration, including a special daily ingest into ADS of arXiv papers, ADS’s use of arXiv read statistics, and arXiv’s use of the ADS linkage of the arXiv and published versions of a given paper. Similarly to ADS, arXiv is also in the process of upgrading their infrastructure to meet their 21st century needs; the two teams have independently selected similar technology stacks and architectures, which opens the door to closer future collaboration.
Presentations and discussions at the main ADSUG meeting covered a lot of ground quickly over the two-day meeting. A full report is expected from the committee in the next month or so, but in the meantime, an overview of the major topics covered is given below, or check out the online program.
Removing the Beta from ADS Beta (a.k.a. Bumblebee)
The ADS development team has been hard at work on ADS Beta, codenamed Bumblebee, since 2015, an outgrowth of earlier efforts (ADS Labs, ADS 2.0) to modernize the interface, infrastructure, and search capabilities of ADS Classic. These development efforts are paying off and Bumblebee is now rapidly approaching feature and content parity with ADS Classic, a nearly 25-year-old system. In addition, Bumblebee provides a much richer search experience, including full-text searching for most recent publications, on-the-fly search filtering, and integration with systems such as ORCID. Full parity with Classic is expected to be reached in early 2018, and ADS Classic will be slowly phased out over the 6-12 months following that milestone.
System architecture improvements
System architecture development in Bumblebee is multi-pronged. The development team reported on recent architecture development in several areas. First, Bumblebee is cloud-based, unlike ADS Classic, which is hosted in locally maintained servers and mirror servers. Scaling up the system to ensure a snappy response time thus requires different solutions, including testing Kubernetes as a scalable deployment technology and other speed improvements. Second, ingesting data into the ADS databases relies on a series of pipelines to transform raw data from publishers and other sources into appropriately formatted and fielded metadata, to calculate citation data and other metrics, and to append other related information to a publication record. These pipelines are in the process of being re-engineered to improve speed and reliability. Third, the front-end interface of Bumblebee has been improved in a number of ways under-the-hood, including improved error handling. Finally, the dev team outlined future areas for improvements to the system architecture, all targeted at improving speed and reliability for users.
After the previous ADSUG meeting, the committee recommended that ADS investigate the effort involved in more thoroughly covering the planetary sciences literature. While much of the planetary sciences corpus is currently included in ADS, it’s technically out of the scope of ADS’s holdings and little effort has been expended to ensure its completeness. ADS reported on a preliminary study to explore what would be required to move the planetary sciences literature into our core holdings. While the study indicated that the move would be welcomed by the planetary sciences community, many of whom have astronomy backgrounds and are already familiar with ADS, it would require a good deal of effort on the part of ADS. The ADSUG is expected to make a recommendation on how ADS should follow up on this preliminary study.
Software citations (Project Asclepias)
ADS, in collaboration with AAS and Zenodo, is working to improve citations to software products. Project Asclepias allows developers of astronomy-related software to upload their versioned code to an online repository such as Github, publish a released version in Zenodo, and receive a DOI for their code. The Zenodo DOI enables researchers to cite the appropriate version of the code directly, without resorting to hacks such as citing a published article describing the software. ADS will soon track the citations to these Zenodo DOIs, allowing researchers with a focus on software development to properly receive credit for their work. Asclepias will be reported on more thoroughly in an upcoming blog post.
Affiliations and collaborations
Author affiliations have long been included in ADS but successfully searching for an author by including their affiliation as a search term is non-trivial. This is in large part due to lack of consistency in how affiliations are recorded. To take ADS’s home institution as an example, an author could record their affiliation as “Center for Astrophysics,” “Harvard-Smithsonian Center for Astrophysics,” “CfA,” or one of many other variants. Mapping these variants to each other is a complex problem that we are solving by incorporating machine learning techniques. The implementation of this new feature is still underway, but will eventually allow users to more confidently search using author affiliations as a search term.
A related issue is that of large collaborations, such as LIGO. For arXiv e-prints and conference proceedings, few authors explicitly list all members of large collaborations, instead generally listing the name of the collaboration itself and perhaps a small number of authors. (Published journal articles do generally include the full author lists.) Searching for a member of a given collaboration will thus fail to find any publications that are “authored” by the collaboration. In addition, similarly to author affiliation, there are many published variants of a given collaboration’s name (e.g. LIGO Scientific Collaboration vs. LIGO Collaboration). With the rise of more and larger collaborations and their prominence, ADS is exploring how best to resolve this issue.
ORCID is a non-profit organization that issues ORCID IDs, a method of disambiguating authors, especially those with common names. ADS has incorporated ORCID functionality into Bumblebee, allowing authors to claim their papers. Papers with approved claims are then discoverable when searching by an author’s ORCID ID, allowing authors to maintain a personal bibliography. ADS is exploring methods for improving and expanding ORCID functionality within the system.
We wrapped up the ADSUG meeting by discussing future directions with the committee, including some long-term possibilities for ADS.
Based on our presentations and their own discussions, the ADSUG is currently preparing a report with their suggestions for current and future development; that report will be available on the ADSUG page when ready. We also welcome community input—if you’re interested in contributing to or serving on the ADSUG, contact the ADSUG chair or your nearest member!