
Framework for Capturing Scientific Claims in Biomedical Literature
Explore a framework by Catherine Blake that goes beyond genes, proteins, and abstracts to capture scientific claims in the ever-growing biomedical literature. The framework aims to synthesize and differentiate claims with varying levels of confidence, automating the process of claim extraction from textual data.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Beyond Genes, Proteins, and Abstracts: A Framework to Capture Scientific Claims Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake cablake@email.unc.edu
Motivation Relentless increase in electronically available text Life Sciences The NLM added the 17 millionth entry to PubMed in April 2007 5,200 journals indexed 12,000 new articles each week ! Chemistry more than 110,000 articles in 1 year alone Consequences: Hundreds of thousands of relevant articles Implicit connections between literature go unnoticed Shift from Retrieval to Synthesis 2
Entity Extraction Newspaper genre People, places, and organizations Message Understanding Conference (MUC) Biomedical genre Genes and proteins Diseases and treatments Chemical compounds Challenges: BioCreative , GENIA, JNLPBA 3
Relationship Extraction Newspaper genre Person moving from one company to another Biomedicine genre genes and proteins e.g. binds, inhibits ARBITER (Rindflesch, Rajan, & Hunter, 2000) Geneways (Rzhetsky, et al, 2004) relEx (Fundel, Kuffner, & Zimmer, 2007) GENIA www-tsujii.is.s.u-tokyo.ac.jp/GENIA 4
Causal Relationships Newspaper genre Causal relationships (Khoo, Chan, & Niu, 1998) Biomedical genre Causes and treats (Price & Delcambre, 2005) Causal knowledge (Khoo, Chan, Niu, 2000) Universal Grammar Causatives (Comrie, 1974, 1981) Action verbs (Thomson, 1987) 5
Claim Definition To assert in the face of possible contradiction Example sentence reporting a claim This study showed that Tamoxifen reduces the breast cancer risk Example Claim Framework Tamoxifenagent reduceschange [breast cancer risk] object 6
Goals Create a Framework that reflects how claims made in biomedical literature The Framework should generalize beyond biomedicine differentiate between different levels of confidence in the claim consider claims made in the full text Populate the Framework automatically 7
The Claim Framework Information facets concepts change basis of the claim Each information facet may have modifiers directionality 8
The Claim Framework Nature of change Required Claim Basis Optional Category Concept A Concept B 1. Explicit Claim Agent Object 2. Implicit Claim Agent Object Optional Optional 3. Correlation 4. Comparison 5. Observation Required Required N/A Required Required Required Required Required Required Optional Required Optional 9
Explicit Claims Indeed, glycine prevented Wy-14643-stimulated superoxide production by Kupffer cells. Claim 1 glycineagent [Wy-14643-stimulated superoxide production]object Claim 2 [Kupffer cells]agent [Wy-14643-stimulated superoxide]object. preventedchange produceschange 10
Implicit Claims In liver the number of peroxisomes increases from about 500-600/cell to > 5000/cell after exposure to peroxisome proliferators. Claim 1 [Peroxisomes proliferators] agent increaseschangeDirection Peroxisomesobject [In the liver]agentModifier [number]agentModifier 11
Correlations A weak but statistically significant correlation was observed between the plasma nm23-H1 level and the WBC count (Figure 1, n=102, r=0.437, P<0.0001) [plasma nm23-H1 level] agent [WBC count] object correlation change [statistically significant] changeModifier 12
Comparisons The plasma concentration of nm23-H1 was higher in patients with AML than in normal controls (P = .0001) Claim 1 [plasma concentration of nm23-H1] basis of claim [Patients with AML]agent higher changeDirection [normal controls]object 13
Observations However, the plasma nm21-H1 protein level was increased in SML-M3 patients (P=.0002) Claim 1 [nm21-H1 protein level]object IncreasedchangeDirection [SML-M3 patients]objectModifier 14
Working Hypothesis 1 The Claim Framework reflects how a scientist communicates her findings Full text documents randomly selected from biomedical literature will report findings using constructs within the Claim Framework Human annotators will agree on facets within the Claim Framework The Claim Framework will generalize to a variety of scientific literatures 15
Working Hypothesis 2 Facets within the Claim Framework can be populated automatically The system will detect all claims identified by the human annotators (i.e. recall) The system will only identify claims that were identified by the human annotators (i.e. precision) The system design will generalize to new literatures by avoiding domain specific constructs 16
Validating the Claim Framework Draft Claim Framework given to two annotators Pilot Study: Identify every claim Include claims that don t conform to the framework Don t consider how this will be automated 17
Validating the Claim Framework Main study 25 articles Verification Random set of sentences annotated twice Feedback provided daily 18
Results All documents Total number of sentences: 5535 Sentences with >=1 claim: 1250 (22.6%) Total number of claims: 3228 Average claims per sentence: 2.51 Claims that did not fit in the Framework: 31 Per document Average number of sentences: 191 Average number of sentences with >=1 claim:43 19
Distribution of Claim Categories Category Explicit Implicit Observation Correlation Total (%) 2489 87 298 174 Pilot(%) 332 3 24 12 Main(%) 2157 76.63 84 274 162 77.11 2.70 9.23 5.39 83.42 0.75 6.03 3.02 2.98 9.73 5.75 Comparison 165 3228 5.11 100 27 398 6.85 100 138 2830 4.9 100 Total 20
All Documents Total (%) 2894 89.65 285 8.83 1246 38.60 3197 99.04 271 8.40 1561 48.36 1897 58.77 1337 41.42 1147 35.53 165 5.11 42 1.30 86 2.66 3228 Annotation Agent Agent Direction Agent Modifier Object Object Direction Object Modifier Change Change Direction Change Modifier Claim Basis Claim Basis Dir. Claim Basis Mod. Words (Avg) 5221 291 4448 6849 283 5383 1953 1358 1618 394 43 266 28107 1.80 1.02 3.57 2.14 1.04 3.44 1.03 1.02 1.41 2.39 1.02 3.09 Total 21 8.70
Inter Annotator Agreement Information Facet Agent Object Change Change+ChangeDir 0.88 Kappa 0.71 0.77 0.57 Agreement substantial substantial moderate almost perfect 22
Location of Claims Total Sentences With Claim Total section claim 98 309 357 979 6 1121 293 1829 539 1406 1250 5535 % % Section Abstract Introduction Method Result Discussion Total 31.72 36.47 0.54 16.02 38.34 22.58 100.00 7.84 28.56 0.48 23.44 43.12 23
Findings thus far 99% of the claims made in these articles could be captured in the Claim Framework 22% of sentences report at least 1 claim 77% of the claims identified were explicit 8% of claims are made in the abstract Agreement substantial between agents and objects almost perfect for change and change direction 24
Acknowledgements This project supported in part by Renaissance Computing Institute (RENCI) Faculty Fellowship Program NSF Center for Environmentally Responsible Solvents and Processes (CERSP CHE-9876674) This project used resources provided by the OSG, which is supported by the NSF & the U.S. Department of Energy's Office of Science The speaker thanks Nassib Nassar and Mats Rynge (RENCI) Amol Bapat and Ryan Jones (SILS)
Questions and Comments Welcome Catherine Blake cablake@email.unc.edu http://www.ils.unc.edu/~cablake School of Information and Library Science University of North Carolina at Chapel Hill