
Unveiling the Microsoft Academic Services Architecture
Dive into the intricate workings of Microsoft Academic Services, focusing on the overall architecture, information extraction, conflation and disambiguation, knowledge refinement, and learning processes. Explore how they identify entities, extract information, refine knowledge, and tackle challenges like fake papers' detection.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Microsoft Academic Services: Behind the Scene May 2021
Overall Architecture Entity Knowledge Publication Data Information Extraction Conflation & Disambiguation Knowledge Refinement & Learning Index & Serving (MAKES) Microsoft Academic Graph REST API
Information Extraction Identify lexical expressions of entities Article titles Authors + Affiliations Topic keywords Citations and Citation Contexts Lookup and embrace alt expressions Left undone Funding acknowledgment Figure/table understanding Publication/Citation Data Made easier by open access Crawling entire web becoming overkill
Conflation & Disambiguation Paper & Paper family LSH on article title Author sequence DOI Author Name key clustering Author profile (cf. Ext Knowledge) Affiliation and topic expertise Co-authorship and venue External Knowledge Author CVs Institution homepages Journal/Conference homepages
Knowledge Refinement & Learning Entity creation and attrition New authors, journals, institutions Topic taxonomy adjustment Language/network similarity models Link strength/missing link prediction Related entity recommendation Saliency assessment Temporal eigenvector centrality Malicious content (e.g., fake paper/journal) detection
Q/A Related articles: https://doi.org/10.3389/FDATA.2019.00045 https://doi.org/10.1162/QSS_A_00021 [www-2021] MATCH: Metadata-Aware Text Classification in A Large Hierarchy (arxiv.org)