Biological and biomedical research has come to rely on accurate and consistent annotation of genes and their products on genome assemblies. Reference annotations of genomes are available from various sources, each with their own independent goals and policies, which results in some annotation variation.
The CCDS project was established to identify a gold standard set of protein-coding gene annotations that are identically annotated on the human and mouse reference genome assemblies by the participating annotation groups. The CCDS gene sets that have been arrived at by consensus of the different partners now consist of over 18,000 human and over 20,000 mouse genes (see CCDS release history). The CCDS dataset is increasingly representing more alternative splicing events with each new release.
"Consensus" is defined as protein-coding regions that agree at the start codon, stop codon, and splice junctions, and for which the prediction meets quality assurance benchmarks. A combination of manual and automated genome annotations provided by (NCBI)
and Ensembl (which incorporates manual HAVANA annotations) are compared to identify annotations with matching genomic coordinates.
In order to ensure that CDSs are of high quality, multiple quality assurance (QA) tests are performed (Table 1). All tests are performed following the annotation comparison step of each CCDS build and are independent of individual annotation group QA tests performed prior to the annotation comparison.
Table 1: Examples of the types of CCDS QA tests performed prior to acceptance of CCDS candidates Annotations that fail QA tests undergo a round of manual checking that may improve results or reach a decision to reject annotation matches based on QA failure.
The CCDS database is unique in that the review process must be carried out by multiple collaborators, and agreement must be reached before any changes can be made. This is made possible with a collaborator coordination system that includes a work process flow and forums for analysis and discussion. The CCDS database operates an internal website that serves multiple purposes including curator communication, collaborator voting, providing special reports and tracking the status of CCDS representations. When a collaborating CCDS group member identifies a CCDS ID that may need review, a voting process is employed to decide on the final outcome.
Coordinated manual curation is supported by a restricted-access website and a discussion e-mail list. CCDS curation guidelines were established to address specific conflicts that were observed at a higher frequency. Establishment of CCDS curation guidelines has helped to make the CCDS curation process more efficient by reducing the number of conflicting votes and time spent in discussion to reach a consensus agreement. A link to the CCDS curation guidelines can be found here.
Curation policies established for the CCDS data set have been integrated in to the RefSeq and HAVANA annotation guidelines and thus, new annotations provided by both groups are more likely to be concordant and result in addition of a CCDS ID. These standards address specific problem areas, are not a comprehensive set of annotation guidelines, and do not restrict the annotation policies of any collaborating group. Examples include, standardized curation guidelines for selection of the initiation codon and interpretation of upstream ORFs and transcripts that are predicted to be candidates for nonsense-mediated decay. Curation occurs continuously, and any of the collaborating centers can flag a CCDS ID as a potential update or withdrawal.
Conflicting opinions are addressed by consulting with scientific experts or other annotation curation groups such as the HUGO Gene Nomenclature Committee (HGNC) and Mouse Genome Informatics (MGI). If a conflict cannot be resolved, then collaborators agree to withdraw the CCDS ID until more information becomes available.
Multiple in-frame translation start sites:
Multiple factors contribute to translation initiation, such as upstream open reading frames (uORFs), secondary structure and the sequence context around the translation initiation site. A common start site is defined within Kozak consensus sequence: (GCC) GCCACCAUGG in vertebrates. The sequence in brackets (GCC) is the motif with unknown biological impact. There are variations within Kozak consensus sequence, such as G or A is observed three nucleotides upstream (at position -3) of AUG. Bases between positions -3 and +4 of Kozak sequence have the most significant impact on translational efficiency. Hence, a sequence (A/G)NNAUGG is defined as a strong Kozak signal in the CCDS project.
According to the scanning mechanism, the small ribosomal subunit can initiate translation from the first reached start codon. There are exceptions to the scanning model:
Upstream open reading frames:
AUG initiation codons located within transcript leaders are known as upstream AUGs (uAUGs). Sometimes, uAUGs are associated with uORFs . uORFs are found in approximately 50% of human and mouse transcripts. The existence of uORFs are another challenge for the CCDS data set. The scanning mechanism for translation initiation suggests that small ribosomal subunits (40S) bind at the 5’ end of a nascent mRNA transcript and scan for the first AUG start codon. It is possible that an uAUG is recognised first, and the corresponding uORF is then translated. The translated uORF could be a NMD candidate, although studies have shown that some uORFs can avoid NMD. The average size limit for uORFs that will escape NMD is approximately 35 amino acids. It also has been suggested that uORFs inhibit translation of the downstream gene by trapping a ribosome initiation complex and causing the ribosome to dissociate from the mRNA transcript before it reaches the protein-coding regions. Currently, no studies have reported the global impact of uORFs on translational regulation.
Quality of reference genome sequence:
As the CCDS data set is built to represent genomic annotations of human and mouse, the quality problems with the human and mouse reference genome sequences become another challenge. Quality problems occur when the reference genome is misassembled. Thereby the misassembled genome may contain premature stop codons, frame-shift indels, or likely polymorphic pseudogenes. Once these quality problems are identified, the CCDS collaborators report the issues to the Genome Reference Consortium, which investigates and makes the necessary corrections.
The CCDS data set size has continued to increase with both the computational genome annotation updates, which integrate new data sets submitted to the International Nucleotide Sequence Database Collaboration (INSDC), and on ongoing curation activities that supplement or improve upon that annotation. Table 2 summarises the key statistics for each CCDS build where Public CCDS IDs are all those that were not under review or pending an update or withdrawal at the time of the current release date.
Table 2. Summary statistics for past CCDS releases.The complete set of release statistics can be found at the official CCDS website on their Releases & Statistics page.
Long-term goals include the addition of attributes that indicate where transcript annotation is also identical (including the UTRs) and to indicate splice variants with different UTRs that have the same CCDS ID. It is also anticipated that as more complete and high-quality genome sequence data become available for other organisms, annotations from these organisms may be in scope for CCDS representation.
The CCDS set will become more complete as the independent curation groups agree on cases where they initially differ, as additional experimental validation of weakly supported genes occurs, and as automatic annotation methods continue to improve. Communication among the CCDS collaborating groups is ongoing and will resolve differences and identify refinements between CCDS update cycles. Human updates are expected to occur roughly every 6 months and mouse releases yearly.
Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. /wiki/Kim_D._Pruitt
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069
Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. /wiki/Kim_D._Pruitt
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Alberts, B; Johnson, A; Lewis, J; Raff, M; Roberts, K; Walter, P (2002). Molecular Biology of the Cell 5th edn. New York: Garland Science.
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118
Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118
Ingolia, NT; Brar, GA; Rouskin, S; McGeachy, AM; Weissman, JS (2014). "Genome-wide Annotation and Quantitation of Translation by Ribosome Profiling". Curr. Protoc. Mol. Biol. Chapter 4: 4.18.1–4.18.19. doi:10.1002/0471142727.mb0418s103. ISBN 9780471142720. PMC 3775365. PMID 23821443. 9780471142720
Calvo, SE; Pagliarni, DJ; Mootha, VK (2009). "Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" (PDF). Proc. Natl. Acad. Sci. U.S.A. 106 (18): 7507–12. Bibcode:2009PNAS..106.7507C. doi:10.1073/pnas.0810916106. PMC 2669787. PMID 19372376. http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf
Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Silva, AL; Pereira, FJC; Morgado, A; Kong, J; Martins, R; Faustino, P; Liebhaber, SA; Romao, L (2006). "The canonical UPF1-dependent nonsense-mediated mRNA decay is inhibited in transcripts carrying a short open reading frame independent of sequence context". RNA. 12 (12): 2160–70. doi:10.1261/rna.201406. PMC 1664719. PMID 17077274. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1664719
Alberts, B; Johnson, A; Lewis, J; Raff, M; Roberts, K; Walter, P (2002). Molecular Biology of the Cell 5th edn. New York: Garland Science.
Calvo, SE; Pagliarni, DJ; Mootha, VK (2009). "Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" (PDF). Proc. Natl. Acad. Sci. U.S.A. 106 (18): 7507–12. Bibcode:2009PNAS..106.7507C. doi:10.1073/pnas.0810916106. PMC 2669787. PMID 19372376. http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Prakash, Tulika; Sharma, Vineet K.; Adati, Naoki; Ozawa, Ritsuko; Kumar, Naveen; Nishida, Yuichiro; Fujikake, Takayoshi; Takeda, Tadayuki; Taylor, Todd D.; Michalak, Pawel (12 October 2010). "Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes". PLOS ONE. 5 (10): e13284. Bibcode:2010PLoSO...513284P. doi:10.1371/journal.pone.0013284. PMC 2953495. PMID 20967262. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2953495
Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164
Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. /wiki/Kim_D._Pruitt
Maglott, D.; Ostell, J.; Pruitt, K. D.; Tatusova, T. (28 November 2010). "Entrez Gene: gene-centered information at NCBI". Nucleic Acids Res. 39 (Database): D52 – D57. doi:10.1093/nar/gkq1237. PMC 3013746. PMID 21115458. /wiki/Donna_R._Maglott
Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. /wiki/Kim_D._Pruitt
Harrow, J.; Frankish, A.; Gonzalez, J. M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B. L.; Barrell, D.; Zadissa, A.; Searle, S.; Barnes, I.; Bignell, A.; Boychenko, V.; Hunt, T.; Kay, M.; Mukherjee, G.; Rajan, J.; Despacio-Reyes, G.; Saunders, G.; Steward, C.; Harte, R.; Lin, M.; Howald, C.; Tanzer, A.; Derrien, T.; Chrast, J.; Walters, N.; Balasubramanian, S.; Pei, B.; Tress, M.; Rodriguez, J. M.; Ezkurdia, I.; van Baren, J.; Brent, M.; Haussler, D.; Kellis, M.; Valencia, A.; Reymond, A.; Gerstein, M.; Guigo, R.; Hubbard, T. J. (5 September 2012). "GENCODE: The reference human genome annotation for The ENCODE Project". Genome Res. 22 (9): 1760–1774. doi:10.1101/gr.135350.111. PMC 3431492. PMID 22955987. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431492
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069
Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard (2011). "A comparative analysis of exome capture". Genome Biol. 12 (9): R97. doi:10.1186/gb-2011-12-9-r97. PMC 3308060. PMID 21958622. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308060
Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069