Challenges in Building Meaningful Models with Omics Data for Biochemical Engineering

Slide Note

This scholarship explores the complexities of utilizing publicly available omics data to enhance our understanding of biological systems. Topics include cell and gene therapy, HEK293 cells, cell line safety, industrial challenges, lack of biological understanding, and the significance of omics data in research. The aim is to address key issues in biochemical engineering to drive innovation and overcome obstacles in the field.

klim_ar Follow

Uploaded on Mar 06, 2025 | 2 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

FACULTY OF ENGINEERING, DEPARTMENT OF BIOCHEMICAL ENGINEERING Open Scholarship: The Challenges in Building Meaningful Models with Publicly Available Omics Data UCL ARC Festival of Digital Research and Scholarship June 11th2024 Eva Price University College London Dr Duygu Dikicioglu - University College London

Cell and Gene Therapy Cell Therapy: Utilises tiny living components to aid in the body's healing and improvement. Gene Therapy: Involves sending specialized instructions to the body's genetic code to correct errors and ensure proper bodily function. 2 Eva Price ARC 2024

Market projected to hit 77.5 BILLION by 2033 3

Human Embryonic Kidney Cells o Human Origin o Viral Vector producer cells o HEK293T Cells are a derivate of HEK293 o The precise mechanisms behind their efficiency as viral vector producers are not fully understood. HEK293T cells cultured by D.Farcas, UCL 4 Eva Price ARC -2024

Cell Line Safety Cost Cell Line Stability Effectiveness Industrial Challenges Scaling Challenges Bioreactor Adaptation 5

Lack of Biological Understanding Aim: To investigate the molecular make up of HEK293 cells to enhance our biological understanding and to inform cell line engineering to overcome or reduce industrial challenges Industrial Challenges Eva Price ARC 2024 6

Omics Data "Omic data" refers to large-scale biological data sets generated from various "-omics" technologies that measure multiple molecules of the same kind in a sample with aim to capture nearly all instances of a molecule type, offering a comprehensive view of the biological system. MULTI-OMICS 7

Public Repositories Growth of SRA over the decade 2011 to September 2021. Public Access contains approximately I. 25.6 Petabase pairs II. originating from over 14.8 million publicly available runs III. 0.83 GB per run IV. 9.6 million spots per run V. 187 bp per spot. Nucleic Acids Res, Volume 50, Issue D1, 7 January 2022, Pages D387 D390, https://doi.org/10.1093/nar/gkab1053 8 Eva Price ARC 2024

The Value of Omics Data Public Repositories Overcome Constraints with: Sample Availability Budget Limitations Experimental Capacity Expertise and Funding 9 Eva Price ARC 2024

Secondary Usage Secondary Data Usage involves using information that has already been collected by someone else for a different reason. Reanalyse the data for novel insight Supplement own research 10

My Research Database OmicLayer Sequence Read Archive Genomic Epigenomic Transcriptomic European Nucleotide Archive Metabolights +100 Metabolomic Terabytes of data Metabolomics Work Bench PRIDE MassIVE Proteomic ProteomeXchange iPro JPOST Eva Price ARC 2024 11

The Challenges Skewed Data Availability Variety of Minimum Information Standards Terminological differences between databases Missing Metadata Inconsistent Ontologies Inaccurate Metadata Difficulties with database search tools Difficulty downloading data 12

Skewed Data Availability 100000 293 293T Study Counts (Log Scale) 10000 1000 100 10 1 Genomics Epigenomics Transcriptomics Proteomics Metabolomics Omic Layer Eva Price ARC 2024 13

Database Search Tools and Extraction Search parameters are superior for sequencing databases in comparison to proteomics More flexible sequence metadata download Bulk Metadata extraction only possible for 1 out of 4 proteomic databases MAGE-TAB slowly being incorporated Proteomics raw data download is designed for individual studies and isn't suitable for bulk retrieval. 14

Inadequate Metadata Unknown 45% Known 55% Eva Price ARC 2024 15

Trends in Reporting Sample Sex in SRA Data Availability Analysis: 100 File Sizes: 3% missing from ENA 90 80 Sample Names: 5% missing from ENA 70 Centre Names: 47% missing from ENA 60 Sequencing Load Dates: 100% missing 50 from ENA 40 Library Names: ENA: 59% missing, SRA: 30 78% missing 20 Sample Sex: ENA: 85% missing, SRA: 10 91% missing 0 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 female not female missing Eva Price - ARC 2024 16

Inaccurate Metadata 100000 Number of different ways HEK293T is assigned to cell line for genomic data 10000 1000 100 HEK293T 10 11863 1 1 1 RIP-Seq Hi-C ChIA-PET ssRNA-seq WGA POOLCLONE WCS SELEX WXS MeDIP-Seq RNA-Seq ATAC-seq MNase-Seq miRNA-Seq OTHER FAIRE-seq ChIP-Seq CLONE Targeted-Capture ncRNA-Seq Bisulfite-Seq HEK239T 1121 WGS 293T 1003 HEK 293T 472 100000 Human Embryonic Kidney 293 cells 26 10000 kidney cell line HEK293T 13 1000 HEK-293T cells 10 100 HEK293T cells 6 10 human HEK 293T 6 1 CUT&TAG mNET-Seq WGS WGA STARR-seq RNP-MaP spKAS-Seq MeDIP-Seq CUT&RUN LAM-HTGTS DNase-Seq SNS-Seq AMPLICON Hi-C ARM-seq miRNA-Seq Ucaps-Seq OTHER Sanger m6A-Seq MNase-Seq ChIP-Seq RIP-Seq eCLIP-seq ncRNA-Seq Ribo-Seq snmCAT-Seq HTGTS-JoinT-Seq PolyA-Seq TT-seq ChIP-seq MAE-Seq PEM-seq NET-Seq Slic-Seq Bisulfite-Seq scMAT-Seq WXS PRO-Seq ChIRP-Seq ATAC-seq DNA-seq MeRIP-seq TSS-Seq SLIM-Seq SELEX Targeted-Capture APEX-seq m6ACE-Seq RNA-Seq Detect-Seq DRB-Seq CAPLOCUS-Seq UniNicE-Seq scRNA-seq Bisulfate-Seq ACT-seq single cell Human embryonic kidney cells 1 17

Terminological Differences SRA Sample ENA secondary_sample_accession SRAStudy secondary_study_accession BioProject study_accession study_alias BioSample sample_accession 18

Potential Causes Minimum Information Standards Communication Issues Funding Limitations Human Error 19

Recommendations Implement data submission tools to minimise human errors Standardise terminology across databases ADDITIONAL FUNDING Establish minimum information standards per datatype rather than per database Consistently update and implement best practices 20 Eva Price - ARC 2024

Relevance and Impact This era of AI and machine learning, emphasizes the importance of optimizing databases for efficient computational analysis, leading to ground-breaking discoveries. The current digital ecosystem around scholarly data publication restricts our ability to maximize our research investments. 21 Eva Price - ARC 2024

Thank You Funding for this PhD project is provided by UCL, Oxford Biomedica and BBSRC as part of the Advanced Bioscience of Viral Vector Products Collaborative Training Partnership (ABViP-CTP) Credits: Icons by Flaticon and illustrations by Stories 22

Challenges in Building Meaningful Models with Omics Data for Biochemical Engineering

Download Presentation

Presentation Transcript

Related

More Related Content