SciLake: Democratising and making sense out of heterogeneous scholarly content
SciLake's mission is to (a) facilitate and empower the creation, interlinking and maintenance of Scientific Knwoledge Graphs (SKGs) and the execution of data science and graph mining queries on top of them, (b) contribute to the democratization of scholarly content and the related added value services implementing a community-driven management approach, and (c) offer advanced, AI-assisted services that exploit customised perspectives of scientific merit to assist the navigation of the vast scientific knowledge space.
SciLake will develop, support, and offer customisable services to the research community following a two-tier service architecture. First, it will offer a comprehensive, open, transparent, and customisable scientific data-lake-as-a-service (service tier 1), empowering and facilitating the creation, interlinking, and maintenance of SKGs both acrossand within different scientific disciplines. On top of that, it will build and offer a tier of customisable, AI-assisted services that facilitate the navigation of scholarly content following a scientific merit-driven approach (tier 2), focusing on two merit aspects which are crucial for the research community at large: impact and reproducibility. The services in both tiers will leverage advanced AI techniques (text and graph mining) that are going to exploit and extend existing technologies provided by SciLake?s technology partners. Finally, to showcase the value of the provided services and their capability to address current and anticipated needs of different research communities, four scientific domains (neuroscience, cancer research, transportation, and energy) have been selected to serve as pilots. For each, the developed services will be customised, to accommodate differences in research procedures, practices, impact measures and types of research objects, and will be validated and evaluated through real-world use cases.
Opix will
- test out SciTagIT, a suite of domain-specific classifiers that automatically assign publications to categories leveraging domain ontologies and classification schemes (e.g. ICD-11, the Glossary for Transport statistics, UN SDGs);
- conduct a quantitative analysis and examine the factors that contribute to the lack of reproducibility and provide recommendations on guidelines to follow and conditions to be met to avoid and reduce the likelihood of a partial or total loss of reproducibility;
- design and develop an open and self-documented API, which will provide access to all developed functionalities of SciLake.