Annotating large numbers of SNPs is a difficult and complex process, which need computational methods to handle such a large dataset. Many tools available have been developed for SNP annotation in different organisms: some of them are optimized for use with organisms densely sampled for SNPs (such as humans), but there are currently few tools available that are species non-specific or support non-model organism data. The majority of SNP annotation tools provide computationally predicted putative deleterious effects of SNPs. These tools examine whether a SNP resides in functional genomic regions such as exons, splice sites, or transcription regulatory sites, and predict the potential corresponding functional effects that the SNP may have using a variety of machine-learning approaches. But the tools and systems that prioritize functionally significant SNPs, suffer from few limitations: First, they examine the putative deleterious effects of SNPs with respect to a single biological function that provide only partial information about the functional significance of SNPs. Second, current systems classify SNPs into deleterious or neutral group.
Rare variants are defined as single nucleotide polymorphisms (SNPs) with a minor allele frequency (MAF) of less than 0.01. As a consequence, training data for the corresponding prediction methods may be different and hence one should be careful to select the appropriate tool for a specific purpose. For the purposes of this article, "SNP" will be used to mean both SNP and SNV, but readers should bear in mind the differences.
For SNP annotation, many kinds of genetic and genomic information are used. Based on the different features used by each annotation tool, SNP annotation methods may be split roughly into the following categories:
Genomic information from surrounding genomic elements is among the most useful information for interpreting the biological function of an observed variant. Information from a known gene is used as a reference to indicate whether the observed variant resides in or near a gene and if it has the potential to disrupt the protein sequence and its function. Gene based annotation is based on the fact that non-synonymous mutations can alter the protein sequence and that splice site mutation may disrupt the transcript splicing pattern.
Knowledge base annotation is done based on the information of gene attribute, protein function and its metabolism. In this type of annotation more emphasis is given to genetic variation that disrupts the protein function domain, protein-protein interaction and biological pathway. The non-coding region of genome contain many important regulatory elements including promoter, enhancer and insulator, any kind of change in this regulatory region can change the functionality of that protein. The mutation in DNA can change the RNA sequence and then influence the RNA secondary structure, RNA binding protein recognition and miRNA binding activity.
This method mainly identifies variant function based on the information whether the variant loci are in the known functional region that harbor genomic or epigenomic signals. The function of non-coding variants are extensive in terms of the affected genomic region and they involve in almost all processes of gene regulation from transcriptional to post translational level
Transcriptional gene regulation process depends on many spatial and temporal factors in the nucleus such as global or local chromatin states, nucleosome positioning, TF binding, enhancer/promoter activities. Variant that alter the function of any of these biological processes may alter the gene regulation and cause phenotypic abnormality. Genetic variants that located in distal regulatory region can affect the binding motif of TFs, chromatin regulators and other distal transcriptional factors, which disturb the interaction between enhancer/silencer and its target gene.
Single nucleotide variant can also affect the cis-acting regulatory elements in mRNA's to inhibit/promote the translation initiation. Change in the synonymous codons region due to mutation may affect the translation efficiency because of codon usage biases. The translation elongation can also be retarded by mutations along the ramp of ribosomal movement. In the post-translational level, genetic variants can contribute to proteostasis and amino acid modifications. However, mechanisms of variant effect in this field are complicated and there are only a few tools available to predict variant's effect on translation related modifications.
Non-synonymous is the variant in exons that change the amino acid sequence encoded by the gene, including single base changes and non frameshift indels. It has been extremely investigated the function of non-synonymous variants on protein and many algorithms have been developed to predict the deleteriousness and pathogenesis of single nucleotide variants (SNVs). Classical bioinformatics tools, such as SIFT, Polyphen and MutationTaster, successfully predict the functional consequence of non-synonymous substitution. PopViz webserver provides a gene-centric approach to visualize the mutation damage prediction scores (CADD, SIFT, PolyPhen-2) or the population genetics (minor allele frequency) versus the amino acid positions of all coding variants of a certain human gene. PopViz is also cross-linked with UniProt database, where the protein domain information can be found, and to then identify the predicted deleterious variants fall into these protein domains on the PopViz plot.
To annotate the vast amounts of available NGS data, currently a large number of SNPs annotation tools are available. Some of them are specific to specific SNPs while others are more general. Some of the available SNPs annotation tools are as follows SNPeff, Ensembl Variant Effect Predictor (VEP), ANNOVAR, FATHMM, PhD-SNP, PolyPhen-2, SuSPect, F-SNP, AnnTools, SeattleSeq, SNPit, SCAN, Snap, SNPs&GO, LS-SNP, Snat, TREAT, TRAMS, Maviant, MutationTaster, SNPdat, Snpranker, NGS – SNP, SVA, VARIANT, SIFT, LIST-S2, PhD-SNP and FAST-SNP. The functions and approaches used in SNPs annotation tools are listed below.
Variant annotation tools use machine learning algorithms to predict variant annotations. Different annotation tools use different algorithms. Common algorithms include:
A large number of variant annotation tools are available for variant annotation. The annotation by different tools does not always agree amongst each other, as the defined rules for data handling differ between applications. It is frankly impossible to perform a perfect comparison of the available tools. Not all tools have the same input and output nor the same functionality. Below is a table of major annotation tools and their functional area.
Different annotations capture diverse aspects of variant function. Simultaneous use of multiple, varied functional annotations could improve rare variants association analysis power of whole exome and whole genome sequencing studies. Some tools have been developed to enable functionally-informed phenotype-genotype association analysis for common and rare variants by incorporating functional annotations in biobank-scale cohorts.
The next generation of SNP annotation webservers can take advantage of the growing amount of data in core bioinformatics resources and use intelligent agents to fetch data from different sources as needed. From a user's point of view, it is more efficient to submit a set of SNPs and receive results in a single step, which makes meta-servers the most attractive choice. However, if SNP annotation tools deliver heterogeneous data covering sequence, structure, regulation, pathways, etc., they must also provide frameworks for integrating data into a decision algorithms, and quantitative confidence measures so users can assess which data are relevant and which are not.
Aubourg S, Rouzé P (2001). "Genome annotation". Plant Physiol. Biochem. 29 (3–4): 181–193. Bibcode:2001PlPB...39..181A. doi:10.1016/S0981-9428(01)01242-6. /wiki/Bibcode_(identifier)
Shen TH, Carlson CS, Tarczy-Hornoch P (August 2009). "SNPit: a federated data integration system for the purpose of functional SNP annotation". Computer Methods and Programs in Biomedicine. 95 (2): 181–189. doi:10.1016/j.cmpb.2009.02.010. PMC 2680224. PMID 19327864. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2680224
N. C. Oraguzie, E.H.A. Rikkerink, S.E. Gardiner, H.N. de Silva (eds.), "Association Mapping in Plants", Springer, 2007
Capriotti E, Nehrt NL, Kann MG, Bromberg Y (July 2012). "Bioinformatics for personal genome interpretation". Briefings in Bioinformatics. 13 (4): 495–512. doi:10.1093/bib/bbr070. PMC 3404395. PMID 22247263. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3404395
P. H. Lee, H. Shatkay, “Ranking single nucleotide polymorphisms by potential deleterious effects”, Computational Biology and Machine Learning Lab, School of Computing, Queen’s University, Kingston, ON, Canada /wiki/Computational_Biology
Goswami, Chayanika; Chattopadhyay, Amrita; Chuang, Eric Y. (June 2021). "Rare variants: data types and analysis strategies". Annals of Translational Medicine. 9 (12): 961. doi:10.21037/atm-21-1635. ISSN 2305-5839. PMC 8267277. PMID 34277761. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267277
M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
Wang Z, Gerstein M, Snyder M (January 2009). "RNA-Seq: a revolutionary tool for transcriptomics". Nature Reviews. Genetics. 10 (1): 57–63. doi:10.1038/nrg2484. PMC 2949280. PMID 19015660. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2949280
Halvorsen M, Martin JS, Broadaway S, Laederach A (August 2010). "Disease-associated mutations that alter the RNA structural ensemble". PLOS Genetics. 6 (8): e1001074. doi:10.1371/journal.pgen.1001074. PMC 2924325. PMID 20808897. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2924325
Wan Y, Qu K, Zhang QC, Flynn RA, Manor O, Ouyang Z, et al. (January 2014). "Landscape and variation of RNA secondary structure across the human transcriptome". Nature. 505 (7485): 706–709. Bibcode:2014Natur.505..706W. doi:10.1038/nature12946. PMC 3973747. PMID 24476892. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3973747
Sauna ZE, Kimchi-Sarfaty C (August 2011). "Understanding the contribution of synonymous mutations to human disease". Nature Reviews. Genetics. 12 (10): 683–691. doi:10.1038/nrg3051. PMID 21878961. S2CID 8358824. /wiki/Doi_(identifier)
Li MJ, Yan B, Sham PC, Wang J (May 2015). "Exploring the function of genetic variants in the non-coding genomic regions: approaches for identifying human regulatory variants affecting gene expression". Briefings in Bioinformatics. 16 (3): 393–412. doi:10.1093/bib/bbu018. PMID 24916300. https://doi.org/10.1093%2Fbib%2Fbbu018
French JD, Ghoussaini M, Edwards SL, Meyer KB, Michailidou K, Ahmed S, et al. (April 2013). "Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers". American Journal of Human Genetics. 92 (4): 489–503. doi:10.1016/j.ajhg.2013.01.002. PMC 3617380. PMID 23540573. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3617380
Faber K, Glatting KH, Mueller PJ, Risch A, Hotz-Wagenblatt A (2011). "Genome-wide prediction of splice-modifying SNPs in human genes using a new analysis pipeline called AASsites". BMC Bioinformatics. 12 (Suppl 4): S2. doi:10.1186/1471-2105-12-s4-s2. PMC 3194194. PMID 21992029. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3194194
Kumar V, Westra HJ, Karjalainen J, Zhernakova DV, Esko T, Hrdlickova B, et al. (2013). "Human disease-associated genetic variation impacts large intergenic non-coding RNA expression". PLOS Genetics. 9 (1): e1003201. doi:10.1371/journal.pgen.1003201. PMC 3547830. PMID 23341781. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3547830
M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
J. Wu, R. Jiang, "Prediction of Deleterious Nonsynonymous Single-Nucleotide Polymorphism for Human Diseases", The Scientific World Journal, 2013, 10 pages
Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC (July 2012). "SIFT web server: predicting effects of amino acid substitutions on proteins". Nucleic Acids Research. 40 (Web Server issue): W452 – W457. doi:10.1093/nar/gks539. PMC 3394338. PMID 22689647. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394338
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. (April 2010). "A method and server for predicting damaging missense mutations". Nature Methods. 7 (4): 248–249. doi:10.1038/nmeth0410-248. PMC 2855889. PMID 20354512. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2855889
Schwarz JM, Rödelsperger C, Schuelke M, Seelow D (August 2010). "MutationTaster evaluates disease-causing potential of sequence alterations". Nature Methods. 7 (8): 575–576. doi:10.1038/nmeth0810-575. PMID 20676075. S2CID 26892938. /wiki/Doi_(identifier)
Zhang P, Bigio B, Rapaport F, Zhang SY, Casanova JL, Abel L, et al. (December 2018). "PopViz: a webserver for visualizing minor allele frequencies and damage prediction scores of human genetic variations". Bioinformatics. 34 (24): 4307–4309. doi:10.1093/bioinformatics/bty536. PMC 6289133. PMID 30535305. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289133
Zhang P, Bigio B, Rapaport F, Zhang SY, Casanova JL, Abel L, et al. (December 2018). "PopViz: a webserver for visualizing minor allele frequencies and damage prediction scores of human genetic variations". Bioinformatics. 34 (24): 4307–4309. doi:10.1093/bioinformatics/bty536. PMC 6289133. PMID 30535305. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289133
M. J. Li, J. Wang, "Current trend of annotating single nucleotide variation in humans – A case study on SNVrap", Elsevier, 2014, pp. 1–9
Ofoegbu TC, David A, Kelley LA, Mezulis S, Islam SA, Mersmann SF, et al. (June 2019). "PhyreRisk: A Dynamic Web Application to Bridge Genomics, Proteomics and 3D Structural Data to Guide Interpretation of Human Genetic Variants". Journal of Molecular Biology. 431 (13): 2460–2466. doi:10.1016/j.jmb.2019.04.043. PMC 6597944. PMID 31075275. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6597944
Ittisoponpisan S, Islam SA, Khanna T, Alhuzimi E, David A, Sternberg MJ (May 2019). "Can Predicted Protein 3D Structures Provide Reliable Insights into whether Missense Variants Are Disease Associated?". Journal of Molecular Biology. 431 (11): 2197–2212. doi:10.1016/j.jmb.2019.04.009. PMC 6544567. PMID 30995449. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6544567
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. (2012). "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3". Fly. 6 (2): 80–92. doi:10.4161/fly.19695. PMC 3679285. PMID 22728672. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679285
"Ensembl Variant Effect Predictor (VEP)". https://www.ensembl.org/info/docs/tools/vep/index.html
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. (June 2016). "The Ensembl Variant Effect Predictor". Genome Biology. 17 (1): 122. doi:10.1186/s13059-016-0974-4. PMC 4893825. PMID 27268795. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4893825
Wang K, Li M, Hakonarson H (September 2010). "ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data". Nucleic Acids Research. 38 (16): e164. doi:10.1093/nar/gkq603. PMC 2938201. PMID 20601685. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2938201
Jäger M, Wang K, Bauer S, Smedley D, Krawitz P, Robinson PN (May 2014). "Jannovar: a java library for exome annotation". Human Mutation. 35 (5): 548–555. doi:10.1002/humu.22531. PMID 24677618. S2CID 10822001. https://doi.org/10.1002%2Fhumu.22531
Capriotti E, Calabrese R, Casadio R (November 2006). "Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information". Bioinformatics. 22 (22): 2729–2734. doi:10.1093/bioinformatics/btl423. PMID 16895930. https://doi.org/10.1093%2Fbioinformatics%2Fbtl423
Adzhubei I, Jordan DM, Sunyaev SR (January 2013). "Predicting functional effect of human missense mutations using PolyPhen-2". Current Protocols in Human Genetics. Chapter 7: Unit7.20. doi:10.1002/0471142905.hg0720s76. PMC 4480630. PMID 23315928. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4480630
Schwarz JM, Rödelsperger C, Schuelke M, Seelow D (August 2010). "MutationTaster evaluates disease-causing potential of sequence alterations". Nature Methods. 7 (8): 575–576. doi:10.1038/nmeth0810-575. PMID 20676075. S2CID 26892938. /wiki/Doi_(identifier)
Yates CM, Filippis I, Kelley LA, Sternberg MJ (July 2014). "SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features". Journal of Molecular Biology. 426 (14): 2692–2701. doi:10.1016/j.jmb.2014.04.026. PMC 4087249. PMID 24810707. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4087249
Lee PH, Shatkay H (January 2008). "F-SNP: computationally predicted functional SNPs for disease association studies". Nucleic Acids Research. 36 (Database issue): D820 – D824. doi:10.1093/nar/gkm904. PMC 2238878. PMID 17986460. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238878
Makarov V, O'Grady T, Cai G, Lihm J, Buxbaum JD, Yoon S (March 2012). "AnnTools: a comprehensive and versatile annotation toolkit for genomic variants". Bioinformatics. 28 (5): 724–725. doi:10.1093/bioinformatics/bts032. PMC 3289923. PMID 22257670. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3289923
Shen TH, Carlson CS, Tarczy-Hornoch P (August 2009). "SNPit: a federated data integration system for the purpose of functional SNP annotation". Computer Methods and Programs in Biomedicine. 95 (2): 181–189. doi:10.1016/j.cmpb.2009.02.010. PMC 2680224. PMID 19327864. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2680224
Gamazon ER, Zhang W, Konkashbaev A, Duan S, Kistner EO, Nicolae DL, et al. (January 2010). "SCAN: SNP and copy number annotation". Bioinformatics. 26 (2): 259–262. doi:10.1093/bioinformatics/btp644. PMC 2852202. PMID 19933162. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2852202
Bromberg Y, Rost B (2007). "SNAP: predict effect of non-synonymous polymorphisms on function". Nucleic Acids Research. 35 (11): 3823–3835. doi:10.1093/nar/gkm238. PMC 1920242. PMID 17526529. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1920242
Calabrese R, Capriotti E, Fariselli P, Martelli PL, Casadio R (August 2009). "Functional annotations improve the predictive score of human disease-related mutations in proteins". Human Mutation. 30 (8): 1237–1244. doi:10.1002/humu.21047. PMID 19514061. S2CID 33900765. https://doi.org/10.1002%2Fhumu.21047
Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, et al. (June 2005). "LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources". Bioinformatics. 21 (12): 2814–2820. doi:10.1093/bioinformatics/bti442. PMID 15827081. https://doi.org/10.1093%2Fbioinformatics%2Fbti442
Asmann YW, Middha S, Hossain A, Baheti S, Li Y, Chai HS, et al. (January 2012). "TREAT: a bioinformatics tool for variant annotations and visualizations in targeted and exome sequencing data". Bioinformatics. 28 (2): 277–278. doi:10.1093/bioinformatics/btr612. PMC 3259432. PMID 22088845. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259432
Doran AG, Creevey CJ (February 2013). "Snpdat: easy and rapid annotation of results from de novo snp discovery projects for model and non-model organisms". BMC Bioinformatics. 14: 45. doi:10.1186/1471-2105-14-45. PMC 3574845. PMID 23390980. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574845
Grant JR, Arantes AS, Liao X, Stothard P (August 2011). "In-depth annotation of SNPs arising from resequencing projects using NGS-SNP". Bioinformatics. 27 (16): 2300–2301. doi:10.1093/bioinformatics/btr372. PMC 3150039. PMID 21697123. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3150039
Ge D, Ruzzo EK, Shianna KV, He M, Pelak K, Heinzen EL, et al. (July 2011). "SVA: software for annotating and visualizing sequenced human genomes". Bioinformatics. 27 (14): 1998–2000. doi:10.1093/bioinformatics/btr317. PMC 3129530. PMID 21624899. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3129530
Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J (July 2012). "VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing". Nucleic Acids Research. 40 (Web Server issue): W54 – W58. doi:10.1093/nar/gks572. PMC 3394276. PMID 22693211. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394276
Ng PC, Henikoff S (July 2003). "SIFT: Predicting amino acid changes that affect protein function". Nucleic Acids Research. 31 (13): 3812–3814. doi:10.1093/nar/gkg509. PMC 168916. PMID 12824425. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC168916
Malhis N, Jones SJ, Gsponer J (April 2019). "Improved measures for evolutionary conservation that exploit taxonomy distances". Nature Communications. 10 (1): 1556. Bibcode:2019NatCo..10.1556M. doi:10.1038/s41467-019-09583-2. PMC 6450959. PMID 30952844. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6450959
Malhis N, Jacobson M, Jones SJ, Gsponer J (July 2020). "LIST-S2: taxonomy based sorting of deleterious missense mutations across species". Nucleic Acids Research. 48 (W1): W154 – W161. doi:10.1093/nar/gkaa288. PMC 7319545. PMID 32352516. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7319545
Yuan HY, Chiou JJ, Tseng WH, Liu CH, Liu CK, Lin YJ, et al. (July 2006). "FASTSNP: an always up-to-date and extendable service for SNP function analysis and prioritization". Nucleic Acids Research. 34 (Web Server issue): W635 – W641. doi:10.1093/nar/gkl236. PMC 1538865. PMID 16845089. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1538865
Mi H, Guo N, Kejariwal A, Thomas PD (January 2007). "PANTHER version 6: protein sequence and function evolution data with expanded representation of biological pathways". Nucleic Acids Research. 35 (Database issue): D247 – D252. doi:10.1093/nar/gkl869. PMC 1716723. PMID 17130144. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1716723
Capriotti E, Altman RB, Bromberg Y (2013). "Collective judgment predicts disease-associated single nucleotide variants". BMC Genomics. 14 (Suppl 3): S2. doi:10.1186/1471-2164-14-S3-S2. PMC 3839641. PMID 23819846. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3839641
http://shiva.rockefeller.edu/PopViz/ http://shiva.rockefeller.edu/PopViz/index.php
Zhang P, Bigio B, Rapaport F, Zhang SY, Casanova JL, Abel L, et al. (December 2018). "PopViz: a webserver for visualizing minor allele frequencies and damage prediction scores of human genetic variations". Bioinformatics. 34 (24): 4307–4309. doi:10.1093/bioinformatics/bty536. PMC 6289133. PMID 30535305. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6289133
Wang K, Li M, Hakonarson H (September 2010). "ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data". Nucleic Acids Research. 38 (16): e164. doi:10.1093/nar/gkq603. PMC 2938201. PMID 20601685. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2938201
"charite/jannovar". GitHub. Retrieved 2016-09-25. https://github.com/charite/jannovar
Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. (2012). "A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3". Fly. 6 (2): 80–92. doi:10.4161/fly.19695. PMC 3679285. PMID 22728672. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3679285
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. (June 2016). "The Ensembl Variant Effect Predictor". Genome Biology. 17 (1): 122. doi:10.1186/s13059-016-0974-4. PMC 4893825. PMID 27268795. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4893825
Makarov V, O'Grady T, Cai G, Lihm J, Buxbaum JD, Yoon S (March 2012). "AnnTools: a comprehensive and versatile annotation toolkit for genomic variants". Bioinformatics. 28 (5): 724–725. doi:10.1093/bioinformatics/bts032. PMC 3289923. PMID 22257670. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3289923
"Input Variation List File for Annotation". SeattleSeq Annotation 151. http://snp.gs.washington.edu/SeattleSeqAnnotation
Medina I, De Maria A, Bleda M, Salavert F, Alonso R, Gonzalez CY, Dopazo J (July 2012). "VARIANT: Command Line, Web service and Web interface for fast and accurate functional characterization of variants found by Next-Generation Sequencing". Nucleic Acids Research. 40 (Web Server issue): W54 – W58. doi:10.1093/nar/gks572. PMC 3394276. PMID 22693211. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394276
Pabinger S, Dander A, Fischer M, Snajder R, Sperk M, Efremova M, et al. (March 2014). "A survey of tools for variant analysis of next-generation genome sequencing data". Briefings in Bioinformatics. 15 (2): 256–278. doi:10.1093/bib/bbs086. PMC 3956068. PMID 23341494. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3956068
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M (January 2018). "Principles and methods of in-silico prioritization of non-coding regulatory variants". Human Genetics. 137 (1): 15–30. doi:10.1007/s00439-017-1861-0. PMC 5892192. PMID 29288389. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5892192
Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, et al. (September 2020). "Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale". Nature Genetics. 52 (9): 969–983. doi:10.1038/s41588-020-0676-4. PMC 7483769. PMID 32839606. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7483769
Watanabe K, Taskesen E, van Bochoven A, Posthuma D (November 2017). "Functional mapping and annotation of genetic associations with FUMA". Nature Communications. 8 (1): 1826. Bibcode:2017NatCo...8.1826W. doi:10.1038/s41467-017-01261-5. PMC 5705698. PMID 29184056. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5705698
Li Z, Li X, Zhou H, Gaynor SM, Selvaraj MS, Arapoglou T, et al. (December 2022). "A framework for detecting noncoding rare-variant associations of large-scale whole-genome sequencing studies". Nature Methods. 19 (12): 1599–1611. doi:10.1038/s41592-022-01640-x. PMC 10008172. PMID 36303018. S2CID 243873361. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10008172
"STAARpipeline: an all-in-one rare-variant tool for biobank-scale whole-genome sequencing data". Nature Methods. 19 (12): 1532–1533. December 2022. doi:10.1038/s41592-022-01641-w. PMID 36316564. S2CID 253246835. /wiki/Doi_(identifier)
Li, Xihao; Quick, Corbin; Zhou, Hufeng; Gaynor, Sheila M.; Liu, Yaowu; Chen, Han; Selvaraj, Margaret Sunitha; Sun, Ryan; Dey, Rounak; Arnett, Donna K.; Bielak, Lawrence F.; Bis, Joshua C.; Blangero, John; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer A.; Cade, Brian E.; Correa, Adolfo; Cupples, L. Adrienne; Curran, Joanne E.; de Vries, Paul S.; Duggirala, Ravindranath; Freedman, Barry I.; Göring, Harald H. H.; Guo, Xiuqing; Haessler, Jeffrey; Kalyani, Rita R.; Kooperberg, Charles; Kral, Brian G.; Lange, Leslie A.; Manichaikul, Ani; Martin, Lisa W.; McGarvey, Stephen T.; Mitchell, Braxton D.; Montasser, May E.; Morrison, Alanna C.; Naseri, Take; O’Connell, Jeffrey R.; Palmer, Nicholette D.; Peyser, Patricia A.; Psaty, Bruce M.; Raffield, Laura M.; Redline, Susan; Reiner, Alexander P.; Reupena, Muagututi’a Sefuiva; Rice, Kenneth M.; Rich, Stephen S.; Sitlani, Colleen M.; Smith, Jennifer A.; Taylor, Kent D.; Vasan, Ramachandran S.; Willer, Cristen J.; Wilson, James G.; Yanek, Lisa R.; Zhao, Wei; NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium; TOPMed Lipids Working Group; Rotter, Jerome I.; Natarajan, Pradeep; Peloso, Gina M.; Li, Zilin; Lin, Xihong (January 2023). "Powerful, scalable and resource-efficient meta-analysis of rare variant associations in large whole genome sequencing studies". Nature Genetics. 55 (1): 154–164. doi:10.1038/s41588-022-01225-6. PMC 10084891. PMID 36564505. S2CID 255084231. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10084891
Goswami, Chayanika; Chattopadhyay, Amrita; Chuang, Eric Y. (June 2021). "Rare variants: data types and analysis strategies". Annals of Translational Medicine. 9 (12): 961. doi:10.21037/atm-21-1635. ISSN 2305-5839. PMC 8267277. PMID 34277761. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267277
Goswami, Chayanika; Chattopadhyay, Amrita; Chuang, Eric Y. (June 2021). "Rare variants: data types and analysis strategies". Annals of Translational Medicine. 9 (12): 961. doi:10.21037/atm-21-1635. ISSN 2305-5839. PMC 8267277. PMID 34277761. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8267277