RISIS/SMS Team wins the 1st OpenAIRE Datathon
The OpenAIRE datathon aimed at stimulating developers and data scientists at analysing the OpenAIRE Information Space with the intent of improving its consumption by users and third-party services. The OpenAIRE information space consists of a scholarly communication graph interlinking publications, datasets, software, research organizations, funders, and projects. The graph is the result of harvesting metadata from around 3000 data providers, harmonizing such metadata, and keeping or inferring links between graph objects described by such metadata. Inference is the result of text-mining a pool of Open Access article full-texts, which numbers around 6 Million full-texts. The graph counts around 60M objects, is openly accessible via APIsand a web portal, and is used today to offer research impacts statistics (e.g. number of products linked to given funders), Open Access trends (e.g. Open Access ratio of products published by given funders), and discovery of interlinked scholarly products (e.g. articles linked to datasets, software linked to articles for communities).
For the data challenge, the OpenAIRE data was provided as a set of RDF triples (SPARQL end point queries), together with the high-level schema and the RDF schema and the challenge was enrichment by interlinking with other LOD datasets, enrichment by mining and analysis, identifying interesting patterns or research networks in the graph, mashups, etc.
The SMS team worked on the OpenAIRE data in combination to existing SMS datasets to show how OpenAIRE data could be of help for the pursuit of navigating across Open Data Sources. First, a quick data quality analysis directed us toward the need for harmonizing and enriching one important property used by OpenAIRE to describe its organisations: the property country. Doing so, it allowed us to move to the data interlinking step. For this purpose, we selected three external datasets namely GRID (Global Research Identifier Database), OrgRef (open data about academic & research organizations) and Cordis EU H2020 Organizations dataset. To stay on course with our goal, we performed a full-mesh data interlinking task across all datasets as only this allows navigating from one dataset to another. The interlinking was done using the Lenticular Lens approach, and then embedded in the SMS (Semantically Mapping Science) platform to enable end-users to browse and visualize the interlinked data to address research questions of their interest.
Read the full report at http://sms.risis.eu/assets/pdf/openaire-datathon-report.pdf
Acknowledgement. We would like to thank the SURFSARA cloud infrastructure for providing us with storage and computational resources to host and query the OpenAIRE data.