Learning Numerical Representations of Biomedical Concepts from Abstracts

Learning Numerical Representations of Biomedical Concepts from Abstracts
Slide Note
Embed
Share

This study explores the use of natural language processing and Word2Vec model to learn numerical representations of biomedical concepts from a vast text corpus. By preparing data with entity recognition and evaluating word embeddings, researchers investigate if word vectors can capture ontology relationships and mirror semantic connections effectively, particularly in gene and disease ontologies. Through high similarity scores and gene-function predictions, the study demonstrates the computational potential in understanding complex biological relationships.

  • Biomedical Concepts
  • Word2Vec Model
  • Natural Language Processing
  • Text Corpus
  • Ontology Relationships

Uploaded on Feb 20, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Learning Numerical Representations of Biomedical Concepts from 28 Million Abstracts Jesus E. Vazquez, Anna Yannakopoulos, Kayla Johnson, Christopher Mancuso, Arjun Krishnan Wednesday, July 25, 2018

  2. Painting the Big Picture Making Gene-Gene Claims Steps: 1. Get a PhD 2. Read Papers 3. Spend lots of time identifying specific characteristics to know if 2 genes are related Can we do this on a computational way? . Yes 2

  3. What is Natural Language Processing (NLP) Analytical techniques for discerning meaning from vast text corpuses. Word2Vec is a neural network model that can learn numerical vector representations of words. 3

  4. 200 Features Preparing the Data Primarily prepared the data using Name Entity Recognition a. Jesus got a heart attack a. Jesus got a myocardial infarction Word2Vec Model 4

  5. Evaluation of Word-Embeddings 200 Features Gene Ontology (GO) and Disease Ontology (DO) Structures 5

  6. Can our word vectors capture ontology relationships? 6

  7. Similar Go Terms are grouped together in vector space 7

  8. Can word-embedding similarity scores mirror semantic relationships? 8

  9. High Similarity Scores Approximate Semantic Relationships Semantic Similarity Between Disease Ontology Terms Disease - Disease 9

  10. Gene-Function Predictions Distributions of AUCs per quartile of Prior were better in Cosine Similarity than in Euclidean Distance when predicting gene- function relationships. 10

  11. From this project we learned: High values of Cosine Similarity and 1/Euclidean Distance scores approximate the semantic relationship between diseases. Word embeddings capture the Gene Ontology structure. Cosine Similarity performs better than Euclidean Distance when predicting gene-function relationships. 11

  12. Word embeddings can capture ontology relationships Increasing the precision of word embeddings to associate terms can help us identify not only relationships between functions but relationships between genes, drugs, diseases, and tissues. Further exploration of the subject is needed to validate results. 12

  13. 6. Acknowledgements This research was supported by the MSU ACRES REU program, which is supported by the National Science Foundation through grant ACI-1560168. I would like to thank Remy L., Nate D., Mark M., Jake C., Essenam B., Jainil S., Chinaza N., Janani R., and my family for their support this summer. 13

Related


More Related Content