Gene Disease Validity Curation Process
Biocurators play a crucial role in determining disease entities for gene curation by reviewing literature, collecting evidence, and finalizing classifications. The process involves identifying distinct disorders, evaluating disease mechanisms, and utilizing tools like the Monarch Disease Ontology. Collaboration with resources such as OMIM and Orphanet ensures accurate curation of genes associated with various diseases.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
ClinGen Gene-Disease Validity Classification Protocol Erin Rooney Riggs, MS, CGC Associate Professor Geisinger Autism & Developmental Medicine Institute eriggs@geisinger.edu
The Gene-Disease Validity Curation Process Biocurator receives an assigned gene from the coordinator. First step: Precuration What disease are we curating for? Biocurator briefly reviews the literature to determine which disease entity(ies) have been proposed for this gene. Group reviews and determines most appropriate disease term. Opportunity to pre-review any scoring/evidence collection issues that may arise during curation. Next step: Curation Biocurator does a more detailed review of the gene-disease pair, collecting both genetic and experimental evidence. Evidence is recorded in the Gene Curation Interface (GCI). Score according to ClinGen protocol Group reviews, provides feedback, and finalizes the classification. Final step: Approval/Publication to Website Biocurator makes any final adjustments to the data in the GCI per GCEP instructions. Biocurator approves and publishes the curation to the ClinGen website from the GCI.
Precuration: Determining the Disease Entity Primary goal: Determine the disease entity to be used in the curation All diseases are described using a term in the Monarch Disease Ontology (MONDO) https://www.ebi.ac.uk/ols/ ontologies/mondo Thaxton et al. 2022 (PMID:35754516)
Has an assertion been made for 1 or more disease entities? What claims have been made about diseases associated with this gene? Check resources such as OMIM, Orphanet, and the literature If more than one disease has been reported: Are these truly distinct disorders, different names for overlapping presentations, different ends of the same clinical spectrum, etc.? Even if only 1 disease has been reported: Is there a reason to choose a term other than the one chosen by, for example, OMIM?
Are there distinct differences in disease mechanism? Many examples of genes that cause different diseases due to different disease mechanisms Example: RET and Hirschprung disease (LOF) and multiple endocrine neoplasia (GOF) If this has been clearly elucidated for a gene, multiple curations may be warranted If this is unclear, it may be most appropriate to lump into an overarching disease entity Example: Most individuals with LOF variants have Disease 1, and most individuals with missense variants have Disease 2, but there are a few reports of individuals with LOF variants and Disease 2 and vice versa. Example: Most individuals with Disease 3 have LOF variants, but some individuals have missense variants. The mechanism by which the missense variants are acting is unknown. As more evidence emerges, the curation can be reevaluated.
Is there observable phenotypic variability? Do individuals with the same variant have different presentations? Within families or across families If yes, may be reason to lump. Are the presentations so clinically distinct that separate curations are warranted? If yes, may be reason to split. RARE for this to be the only reason to split a curation. Example: SCN1A and early infantile epileptic encephalopathy and familial hemiplegic migraine
Are there differences in inheritance pattern? Example: Monoallelic variants cause Disease 1 (autosomal dominant), biallelic variants cause Disease 2 (autosomal recessive) This would be a SPLIT curation. If both monoallelic and biallelic variants are observed in the same disease, consider a single curation using the semidominant mode of inheritance. Example: APOB and familial hypobetalipoproteinemia
Overall: If you were to observe a variant in this gene on a laboratory test with no accompanying phenotype information, do you feel reasonably confident that you could predict the presentation based on what you know about the variant alone? If yes, split. If no, lump. Thaxton et al. 2022 (PMID:35754516)
Evidence Collection: Literature Search Primary evidence: peer-reviewed literature Database cases (e.g. ClinVar, DECIPHER) can be used with discretion (e.g. is enough information provided to verify the claim?) Other sources: PubMed, Google Scholar, LitVar, GeneCards, OMIM, GeneReviews, UniProt, MGI, NCBI Gene, Mastermind, databases relevant to your disease area, etc. Initial search should be broad and inclusive For example: Gene Symbol or Gene Symbol AND Disease Be aware of gene symbol aliases and disease synonyms Primary literature is ideal Reviews can be used to locate relevant primary literature Revies can be cited themselves when they provide sufficient detail to evaluate evidence claims Not necessarily a comprehensive review of literature Once you have reached the threshold for Definitive and/or the maximum number of points within a given category, there is no need to document further evidence.
Case-Level Data Case-level data is divided into two parts: Variant evidence Segregation evidence Points may be awarded to probands IF: Enough phenotypic detail is provided to determine that the proband truly has the diagnosis in question AND The variant(s) identified in the proband is a plausible cause for disease Frequency in general population is consistent with what is known about the penetrance/prevalence of disease Variant consequence consistent with disease mechanism (if known) Variant has some indication of potential role in disease Each proband can be given points for both variant evidence and segregation evidence if applicable
Case-Level Variant Evidence Baseline number of points assigned per variant type Predicted/proven null = 1.5 points Other variant type (e.g., missense) = 0.1 points Add points to the baseline score for common upgrades as described in the table Functional data De novo status Other upgrades/downgrades may be appropriate at the discretion of the GCEP
Case-Level Variant Evidence For AR diseases: assess each variant independently, then sum for final score No single proband may score more than 3 points
Caveats Default scores assume the variant type is consistent with the expected disease mechanism. If this is not the case, downgrade or do not score unless there is compelling rationale for partial scoring. This rationale should be documented. For example, if the disease mechanism is known to be gain of function, consider not scoring null variants and upgrading gain of function missense variants. Variants may be up- or downgraded beyond the values suggested here (but within the scoring range) based on the strength of evidence. For example, a missense variant may score at the top of its range if robust functional assays demonstrate that the missense is acting in a manner consistent with the expected disease mechanism.
Caveats Variants may be up- or downgraded for other reasons beyond those listed in this chart at the discretion of the GCEP. Rationale for up- or downgrading variants should always be documented. For example, one may opt to upgrade missense variants if they are within a known functional domain, if they appear to be clustering in the same area of the gene, etc. Consider upgrading based on consistency and/or specificity of the phenotype, the likelihood that a putative null variant actually leads to loss of function, etc. When assigning points for de novo status, consider further upgrades if statistical evidence shows that de novo variation in a particular gene is rare. Use caution (and consider not upgrading or not scoring) if a gene is known to have a high rate of de novo variation (e.g. TTN).
AD/XL Inheritance Example Scenarios 1 missense variant with no functional data = baseline 0.1 points Add functional data (0.1 + 0.4) = 0.5 points Add de novo status (0.1 + 0.4) = 0.5 points Add both (0.1 + 0.4 + 0.4) = 0.9 round to the nearest 0.5 = 1 point 1 LOF variant = baseline 1.5 points Add functional data (1.5 + 0.5) = 2 points Add de novo status (1.5 + 0.5) = 2 points Add both (1.5 + 0.5 +0.5) = 2.5 points
AD/XL Inheritance Example Scenarios 1 missense variant with no functional data = baseline 0.1 points Add functional data (0.1 + 0.4) = 0.5 points Add de novo status (0.1 + 0.4) = 0.5 points Add both (0.1 + 0.4 + 0.4) = 0.9 round to the nearest 0.5 = 1 point 1 LOF variant = baseline 1.5 points Add functional data (1.5 + 0.5) = 2 points Add de novo status (1.5 + 0.5) = 2 points Add both (1.5 + 0.5 +0.5) = 2.5 points
AR Inheritance Example Scenarios Score for Variant 1 + Score for Variant 2 = Final Score Round to nearest 0.5 Example: Variant 1 is a missense variant w/o functional data. Variant 2 is a de novo LOF variant Variant 1 = 0.1 points Variant 2 = 1.5 + 0.5 = 2 points Total score: 0.1 + 2 = 2.1 round down to 2 points
AR Inheritance Example Scenarios Score for Variant 1 + Score for Variant 2 = Final Score Round to nearest 0.5 Example: Variant 1 is a missense variant with functional data. Variant 2 is a de novo missense variant Variant 1 = 0.1 + 0.4 = 0.5 points Variant 2 = 0.1 + 0.4 = 0.5 points Total score: 0.5 + 0.5 = 1 point
AR Inheritance Example Scenarios Score for Variant 1 + Score for Variant 2 = Final Score No single proband can score more than 3 points Example: Variant 1 is a LOF variant with functional data. Variant 2 is a de novo LOF variant Variant 1 = 1.5 + 0.5 = 2 points Variant 2 = 1.5 + 0.5 = 2 points Total score: 2 + 2 = 4 points CAP AT 3 POINTS
Other notes on variant scoring Detailed instructions on scoring are available here: https://clinicalgenome.org/docs/gene-disease-validity- standard-operating-procedure/ Common questions/issues: If a variant is not convincing, you can score it at 0! Only score one individual in a family with the variant points; segregation is assessed separately. De novo means BOTH parents have been tested and do not have the variant do not apply de novo points if only one parent is available for testing. If one parent is found to be mosaic for the variant, the proband can be counted as de novo. ClinGen Gene-Disease Validity Standard Operating Procedure (SOP)
Other notes on variant scoring Predicted/proven null typically refers to the following: Nonsense Frameshift Canonical +/- 1 or 2 splice site variants Single or multi-exon deletions Whole gene deletions (single gene only multigenic deletions do not count as evidence) Missense may be scored as high as a predicted/proven null variant IF there is functional evidence demonstrating LOF Consider downgrading these variants if there is evidence to suggest they are NOT LOF (e.g., NMD not expected to occur) Downgrade/do not score these variants if LOF is not the mechanism for disease in your curation
Other notes on variant scoring Other variant type typically includes the following: Missense In-frame deletions/insertions Variants of any type that result in gain of function or dominant-negative impact Some functional impact of the variant to the gene product must be demonstrated to receive upgraded points In silico predictions do not count as evidence of functional impact Potential exception: in-depth in silico modeling demonstrating impact on 3D structure expert discretion
Segregation Scoring Criteria must be met before a family can be considered for segregation scoring: AD/XL: Family must have 4 or more segregations Genotype+/Phenotype + individuals OR obligate carriers AR: At least 3 affected (genotype+/phenotype+) individuals If the family meets this criteria, determine LOD score Use LOD provided by the authors if available If not provided, a formula is provided in the SOP to estimate the LOD based on information provided (# segregations, etc.) Determine the final segregation score by: Adding the LOD scores of all qualifying families Accounting for the number of families identified via candidate gene sequencing vs. ES/GS
Case-Control Data Range of points provided; no specified point values When determining how many points to award a given case-control study, consider the following: Variant detection methodology: Were cases and controls analyzed using the same methods? Power: Do the number of cases and controls analyzed provide enough statistical power to detect an association? Bias and confounding factors: Are there systematic differences between cases and controls? Statistical significance: Are the results significant? Does the CI include 1? Have appropriate corrections been made for multiple testing? Etc.
Experimental Evidence (Gene Level) Consistent with MacArthur et al. Nature. 2014 Apr 24;508(7497):469-76
Experimental Evidence (Gene Level) Consistent with MacArthur et al. Nature. 2014 Apr 24;508(7497):469-76
Experimental Evidence Scoring Strande et al. Am J Hum Genet 2017
Contradictory Evidence Not quantified in the summary matrix Manual review and expert input is needed to accurately assess this type of information in the context of the available supporting evidence No score will be generated in these situations Summary matrix can still be used to organize and display both types of evidence for further review. Examples: Case-control data is not significant MAF is too high for disease Gene-disease relationship cannot be replicated Original probands are later reported to have alternate causes of disease or different diseases entirely Non-segregations Non-supportive functional data