Consensus CDS Project

<h2 id="motivation-and-background">Motivation and background</h2>
Biological and biomedical research has come to rely on accurate and consistent annotation of genes and their products on genome assemblies. Reference annotations of genomes are available from various sources, each with their own independent goals and policies, which results in some annotation variation.
The CCDS project was established to identify a gold standard set of protein-coding gene annotations that are identically annotated on the human and mouse <a href="/facts/Reference_genome/0N144oyY">reference genome</a> assemblies by the participating annotation groups. The CCDS gene sets that have been arrived at by consensus of the different partners <a class="footnote-ref" id="fnref:3" href="#fn:3">3</a> now consist of over 18,000 human and over 20,000 mouse genes (see CCDS release history). The CCDS dataset is increasingly representing more <a href="/facts/Alternative_splicing/nNRwPVNK">alternative splicing</a> events with each new release.<a class="footnote-ref" id="fnref:4" href="#fn:4">4</a>

<h2 id="contributing-groups">Contributing groups</h2>
Participating annotation groups include:<a class="footnote-ref" id="fnref:5" href="#fn:5">5</a>

<ul><li>National Center for Biotechnology Information <a href="/facts/National_Center_for_Biotechnology_Information/6Xj5wCbu">(NCBI)</a></li>
<li>European Bioinformatics Institute <a href="/facts/European_Bioinformatics_Institute/g1GsqPXd">(EBI)</a></li>
<li>Wellcome Trust Sanger Institute <a href="/facts/Wellcome_Trust_Sanger_Institute/uQchlx8p">(WTSI)</a></li>
<li>HUGO Gene Nomenclature Committee <a href="/facts/HUGO_Gene_Nomenclature_Committee/c26pY9tE">(HGNC)</a></li>
<li>Mouse Genome Informatics <a href="/facts/Mouse_Genome_Informatics/cRF3seoe">(MGI)</a></li></ul>
Manual annotation is provided by:

<ul><li>Reference Sequence (<a href="/facts/RefSeq/5kBXXR8s">RefSeq</a>) at NCBI</li>
<li>Human and Vertebrate Analysis and Annotation (HAVANA) at <a href="/facts/Wellcome_Trust_Sanger_Institute/uQchlx8p">WTSI</a></li></ul>
<h2 id="defining-the-ccds-gene-set">Defining the CCDS gene set</h2>
"Consensus" is defined as protein-coding regions that agree at the start codon, stop codon, and splice junctions, and for which the prediction meets quality assurance benchmarks.<a class="footnote-ref" id="fnref:6" href="#fn:6">6</a> A combination of manual and automated genome annotations provided by <a href="/facts/National_Center_for_Biotechnology_Information/6Xj5wCbu">(NCBI)</a>
and <a href="/facts/Ensembl/dEhjvNaA">Ensembl</a> (which incorporates manual HAVANA annotations) are compared to identify annotations with matching genomic coordinates.

<h2 id="quality-assurance-testing">Quality assurance testing</h2>
In order to ensure that CDSs are of high quality, multiple quality assurance (QA) tests are performed (Table 1). All tests are performed following the annotation comparison step of each CCDS build and are independent of individual annotation group QA tests performed prior to the annotation comparison.<a class="footnote-ref" id="fnref:7" href="#fn:7">7</a>

Table 1: Examples of the types of CCDS QA tests performed prior to acceptance of CCDS candidates <a class="footnote-ref" id="fnref:8" href="#fn:8">8</a><table><tbody><tr><th scope="col">QA test</th><th scope="col">Purpose of the test</th></tr><tr><td>Subject to NMD</td><td>Checks for transcripts that may be subject to nonsense-mediated decay (NMD)</td></tr><tr><td>Low quality</td><td>Checks for low coding propensity</td></tr><tr><td>Non-consensus splice sites</td><td>Checks for non-canonical splice sites</td></tr><tr><td>Predicted pseudogene</td><td>Checks for genes that are predicted to be pseudogenes by UCSC</td></tr><tr><td>Too short</td><td>Checks for transcripts or proteins that are unusually short, typically <100 amino acids</td></tr><tr><td>Ortholog not found/not conserved</td><td>Checks for genes that are not conserved and/or are not in a HomoloGene cluster</td></tr><tr><td>CDS start or stop not in alignment</td><td>Checks for a start or stop codon in the reference genome sequence</td></tr><tr><td>Internal stop</td><td>Checks for the presence of an internal stop codon in the genomic sequence</td></tr><tr><td>NCBI:Ensembl protein length different</td><td>Checks if the protein encoded by the NCBI RefSeq is the same length as the EBI/WTSI protein</td></tr><tr><td>NCBI:Ensembl low percent identity</td><td>Checks for >99% overall identity between the NCBI and EBI/WTSI proteins</td></tr><tr><td>Gene discontinued</td><td>Checks if the GeneID is no longer valid</td></tr></tbody></table>
Annotations that fail QA tests undergo a round of manual checking that may improve results or reach a decision to reject annotation matches based on QA failure.

<h2 id="review-process">Review process</h2>
The CCDS database is unique in that the review process must be carried out by multiple collaborators, and agreement must be reached before any changes can be made. This is made possible with a collaborator coordination system that includes a work process flow and forums for analysis and discussion. The CCDS database operates an internal website that serves multiple purposes including curator communication, collaborator voting, providing special reports and tracking the status of CCDS representations. When a collaborating CCDS group member identifies a CCDS ID that may need review, a voting process is employed to decide on the final outcome.

<h2 id="manual-curation">Manual curation</h2>
Coordinated manual curation is supported by a restricted-access website and a discussion e-mail list. CCDS curation guidelines were established to address specific conflicts that were observed at a higher frequency. Establishment of CCDS curation guidelines has helped to make the CCDS curation process more efficient by reducing the number of conflicting votes and time spent in discussion to reach a consensus agreement. A link to the CCDS curation guidelines can be found <a href="https://www.ncbi.nlm.nih.gov/CCDS/docs/CCDS_curation_guidelines.pdf">here</a>.
Curation policies established for the CCDS data set have been integrated in to the <a href="/facts/RefSeq/5kBXXR8s">RefSeq</a> and HAVANA annotation guidelines and thus, new annotations provided by both groups are more likely to be concordant and result in addition of a CCDS ID. These standards address specific problem areas, are not a comprehensive set of annotation guidelines, and do not restrict the annotation policies of any collaborating group.<a class="footnote-ref" id="fnref:9" href="#fn:9">9</a> Examples include, standardized curation guidelines for selection of the initiation codon and interpretation of upstream <a href="/facts/Open_reading_frame/Saams83e">ORFs</a> and transcripts that are predicted to be candidates for <a href="/facts/Nonsense-mediated_decay/HMESQdIF">nonsense-mediated decay</a>. Curation occurs continuously, and any of the collaborating centers can flag a CCDS ID as a potential update or withdrawal.
Conflicting opinions are addressed by consulting with scientific experts or other annotation curation groups such as the HUGO Gene Nomenclature Committee <a href="/facts/HUGO_Gene_Nomenclature_Committee/c26pY9tE">(HGNC)</a> and Mouse Genome Informatics <a href="/facts/Mouse_Genome_Informatics/cRF3seoe">(MGI)</a>. If a conflict cannot be resolved, then collaborators agree to withdraw the CCDS ID until more information becomes available.

<h2 id="curation-challenges-and-annotation-guidelines">Curation challenges and annotation guidelines</h2>
Nonsense-mediated decay (NMD):
<a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> is the most powerful <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> surveillance process. <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> eliminates defective <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> before it can be translated into protein.<a class="footnote-ref" id="fnref:10" href="#fn:10">10</a> This is important because if the defective <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> is translated, the truncated protein may cause disease. Different mechanisms have been proposed to explain <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a>; one being the <a href="/facts/Exon_junction_complex/GRcTSQVh">exon junction complex</a> (EJC) model. In this model, if the stop codon is >50 nt upstream of the last exon-exon junction, the transcript is assumed to be a <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidate.<a class="footnote-ref" id="fnref:11" href="#fn:11">11</a> The CCDS collaborators use a conservative method, based on the EJC model, to screen mRNA transcripts. Any transcripts determined to be <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidates are excluded from the CCDS data set except in the following situations:<a class="footnote-ref" id="fnref:12" href="#fn:12">12</a>

<ol><li>all transcripts at one particular locus are assessed to be <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidates however the locus is previously known to be protein coding region;</li>
<li>there is experimental evidence suggesting that a functional protein is produced from the <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidate transcript.</li></ol>
Previously, <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidate transcripts were considered to be protein coding transcripts by both <a href="/facts/RefSeq/5kBXXR8s">RefSeq</a> and HAVANA, and thereby, these <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidate transcripts were represented in the CCDS data set. The <a href="/facts/RefSeq/5kBXXR8s">RefSeq</a> group and the HAVANA project have subsequently revised their annotation policies.
Multiple in-frame translation start sites:
Multiple factors contribute to translation initiation, such as upstream <a href="/facts/Open_reading_frame/Saams83e">open reading frames</a> (uORFs), secondary structure and the sequence context around the translation initiation site. A common start site is defined within Kozak consensus sequence: (GCC) GCCACCAUGG in vertebrates. The sequence in brackets (GCC) is the motif with unknown biological impact.<a class="footnote-ref" id="fnref:13" href="#fn:13">13</a> There are variations within Kozak consensus sequence, such as G or A is observed three nucleotides upstream (at position -3) of AUG. Bases between positions -3 and +4 of Kozak sequence have the most significant impact on translational efficiency. Hence, a sequence (A/G)NNAUGG is defined as a strong Kozak signal in the CCDS project.
According to the scanning mechanism, the small ribosomal subunit can initiate translation from the first reached start codon. There are exceptions to the scanning model: 

<ol><li>when the initiation site is not surrounded by a strong Kozak signal, which results in leaky scanning. Thereby, the <a href="/facts/Ribosome/CF2u2gtg">ribosome</a> skips this AUG and initiates translation from a downstream start site;</li>
<li>when a shorter <a href="/facts/Open_reading_frame/Saams83e">ORF</a> can allow the <a href="/facts/Ribosome/CF2u2gtg">ribosome</a> to re-initiate translation at a downstream <a href="/facts/Open_reading_frame/Saams83e">ORF</a>.<a class="footnote-ref" id="fnref:14" href="#fn:14">14</a></li></ol>
According to the CCDS annotation guidelines, the longest <a href="/facts/Open_reading_frame/Saams83e">ORF</a> must be annotated except when there is experimental evidence that an internal start site is used to initiate translation. Additionally, other types of new data, such as ribosome profiling data,<a class="footnote-ref" id="fnref:15" href="#fn:15">15</a> can be used to identify start codons. The CCDS data set records one translation initiation site per CCDS ID. Any alternative start sites may be used for translation and will be stated in a CCDS public note.
Upstream open reading frames:
AUG initiation codons located within transcript leaders are known as upstream AUGs (uAUGs). Sometimes, uAUGs are associated with u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> . u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> are found in approximately 50% of human and mouse transcripts.<a class="footnote-ref" id="fnref:16" href="#fn:16">16</a> The existence of u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> are another challenge for the CCDS data set. The scanning mechanism for translation initiation suggests that small ribosomal subunits (40S) bind at the 5’ end of a nascent <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> transcript and scan for the first AUG start codon.<a class="footnote-ref" id="fnref:17" href="#fn:17">17</a> It is possible that an uAUG is recognised first, and the corresponding uORF is then translated. The translated u<a href="/facts/Open_reading_frame/Saams83e">ORF</a> could be a <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> candidate, although studies have shown that some u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> can avoid <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a>. The average size limit for u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> that will escape <a href="/facts/Nonsense-mediated_decay/HMESQdIF">NMD</a> is approximately 35 <a href="/facts/Amino_acid/WDxYwBny">amino acids</a>.<a class="footnote-ref" id="fnref:18" href="#fn:18">18</a><a class="footnote-ref" id="fnref:19" href="#fn:19">19</a> It also has been suggested that u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> inhibit translation of the downstream gene by trapping a <a href="/facts/Ribosome/CF2u2gtg">ribosome</a> initiation complex and causing the <a href="/facts/Ribosome/CF2u2gtg">ribosome</a> to dissociate from the <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> transcript before it reaches the protein-coding regions.<a class="footnote-ref" id="fnref:20" href="#fn:20">20</a><a class="footnote-ref" id="fnref:21" href="#fn:21">21</a> Currently, no studies have reported the global impact of u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> on translational regulation.
The current CCDS annotation guidelines allow the inclusion of <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> transcripts containing u<a href="/facts/Open_reading_frame/Saams83e">ORFs</a> if they meet the following two biological requirements:<a class="footnote-ref" id="fnref:22" href="#fn:22">22</a>

<ol><li>the <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> transcript has a strong Kozak signal;</li>
<li>the <a href="/facts/Messenger_RNA/oUL6qxQn">mRNA</a> transcript is either ≥ 35 <a href="/facts/Amino_acid/WDxYwBny">amino acids</a> or overlaps with the primary <a href="/facts/Open_reading_frame/Saams83e">open reading frame</a>.</li></ol>
Read-through transcripts:
Read-through transcripts are also known as <a href="/facts/Conjoined_gene/yWYlt3L1">conjoined genes</a> or co-transcribed genes. Read-through transcripts are defined as transcripts combining at least part of one exon from each of two or more distinct known (partner) genes which lie on the same chromosome in the same orientation.<a class="footnote-ref" id="fnref:23" href="#fn:23">23</a> The biological function of read-through transcripts and their corresponding protein molecules remain unknown. However, the definition of a read-through gene in the CCDS data set is that the individual partner genes must be distinct, and the read-through transcripts must share ≥ 1 exon (or ≥ 2 splice sites except in the case of a shared terminal exon) with each of the distinct shorter loci.<a class="footnote-ref" id="fnref:24" href="#fn:24">24</a> Transcripts are not considered to be read-through transcripts in the following circumstances: 

<ol><li>when transcripts are produced from <a href="/facts/Overlapping_genes/nXljRsrE">overlapping genes</a> but do not share same splice sites;</li>
<li>when transcripts are translated from genes that have nested structures relative to each other. In this instance, the CCDS collaborators and the <a href="/facts/HUGO_Gene_Nomenclature_Committee/c26pY9tE">HGNC</a> have agreed that the read-through transcript be represented as a separate locus.</li></ol>
Quality of reference genome sequence:
As the CCDS data set is built to represent genomic annotations of human and mouse, the quality problems with the human and mouse <a href="/facts/Reference_genome/0N144oyY">reference genome</a> sequences become another challenge. Quality problems occur when the reference genome is misassembled. Thereby the misassembled genome may contain premature <a href="/facts/Stop_codon/1yuAkq9X">stop codons</a>, <a href="/facts/Frameshift_mutation/ncWrMHqm">frame-shift indels</a>, or likely polymorphic <a href="/facts/Pseudogene/VfbL4L07">pseudogenes</a>. Once these quality problems are identified, the CCDS collaborators report the issues to the Genome Reference Consortium, which investigates and makes the necessary corrections.

<h2 id="access-to-ccds-data">Access to CCDS data</h2>
The CCDS project is available from the NCBI CCDS data set page <a href="https://www.ncbi.nlm.nih.gov/CCDS/">(here)</a>, which provides FTP download links and a query interface to acquire information about CCDS sequences and locations. CCDS reports can be obtained by using the query interface, which is located at the top of the CCDS data set page. Users can select various types of identifiers such as CCDS ID, gene ID, gene symbol, nucleotide ID and protein ID to search for specific CCDS information.<a class="footnote-ref" id="fnref:25" href="#fn:25">25</a> The CCDS reports (Figure 1) are presented in a table format, providing links to specific resources, such as a history report, <a href="/facts/Entrez/1b3UvdGY">Entrez Gene</a><a class="footnote-ref" id="fnref:26" href="#fn:26">26</a> or re-query the CCDS data set. The sequence identifiers table presents transcript information in <a href="/facts/Vertebrate_and_Genome_Annotation_Project/GyNg7QAk">VEGA</a>, <a href="/facts/Ensembl/dEhjvNaA">Ensembl</a> and <a href="https://www.ncbi.nlm.nih.gov/sutils/blink.cgi?mode=query">Blink</a>. The chromosome location table includes the genomic coordinates for each individual exon of the specific coding sequence. This table also provides links to several different genome browsers, which allow you to visualise the structure of the coding region.<a class="footnote-ref" id="fnref:27" href="#fn:27">27</a> Exact nucleotide sequence and protein sequence of the specific coding sequence are also displayed in the section of CCDS sequence data.

<h2 id="current-applications">Current applications</h2>
The CCDS dataset is an integral part of the <a href="/facts/GENCODE/feuygMtY">GENCODE</a> gene annotation project<a class="footnote-ref" id="fnref:28" href="#fn:28">28</a> and it is used as a standard for high-quality coding exon definition in various research fields, including clinical studies, large-scale <a href="/facts/Epigenomics/JqRrC3e4">epigenomic</a> studies, <a href="/facts/Exome/qYLzGDpz">exome</a> projects and exon array design.<a class="footnote-ref" id="fnref:29" href="#fn:29">29</a> Due to the consensus annotation of CCDS exons by the independent annotation groups, <a href="/facts/Exome/qYLzGDpz">exome</a> projects in particular have regarded CCDS coding exons as reliable targets for downstream studies (e.g., for <a href="/facts/Single-nucleotide_polymorphism/Qy5Y2wom">single nucleotide variant</a> detection), and these exons have been used as <a href="/facts/Coding_region/eSR9aIKr">coding region</a> targets in commercially available <a href="/facts/Exome/qYLzGDpz">exome</a> kits.<a class="footnote-ref" id="fnref:30" href="#fn:30">30</a>

<h2 id="ccds-release-history">CCDS release history</h2>
The CCDS data set size has continued to increase with both the computational genome annotation updates, which integrate new data sets submitted to the International Nucleotide Sequence Database Collaboration <a href="http://www.insdc.org/">(INSDC</a>), and on ongoing curation activities that supplement or improve upon that annotation. Table 2 summarises the key statistics for each CCDS build where Public CCDS IDs are all those that were not under review or pending an update or withdrawal at the time of the current release date.

Table 2. Summary statistics for past CCDS releases.<table><tbody><tr><th scope="col">Release</th><th scope="col">Species</th><th scope="col">Assembly name</th><th scope="col">Public CCDS ID count</th><th scope="col">Gene ID count</th><th scope="col">Current release date</th></tr><tr><td>1</td><td>Homo sapiens</td><td>NCBI35</td><td>13,740</td><td>12,950</td><td>Mar 14, 2007</td></tr><tr><td>2</td><td>Mus musculus</td><td>MGSCv36</td><td>13,218</td><td>13,012</td><td>Nov 28, 2007</td></tr><tr><td>3</td><td>Homo sapiens</td><td>NCBI36</td><td>17,494</td><td>15,805</td><td>May 1, 2008</td></tr><tr><td>4</td><td>Mus musculus</td><td>MGSCv37</td><td>17, 082</td><td>16,888</td><td>Jan 24, 2011</td></tr><tr><td>5</td><td>Homo sapiens</td><td>NCBI36</td><td>19,393</td><td>17,053</td><td>Sep 2, 2009</td></tr><tr><td>6</td><td>Homo sapiens</td><td>GRCh37</td><td>22,912</td><td>18,174</td><td>Apr 20, 2011</td></tr><tr><td>7</td><td>Mus musculus</td><td>MGSCv37</td><td>21,874</td><td>19,507</td><td>Aug 14, 2012</td></tr><tr><td>8</td><td>Homo sapiens</td><td>GRCh37.p2</td><td>25,354</td><td>18,407</td><td>Sep 6, 2011</td></tr><tr><td>9</td><td>Homo sapiens</td><td>GRCh37.p5</td><td>26,254</td><td>18,474</td><td>Oct 25, 2012</td></tr><tr><td>10</td><td>Mus musculus</td><td>GRCm38</td><td>22,934</td><td>19,945</td><td>Aug 5, 2013</td></tr><tr><td>11</td><td>Homo sapiens</td><td>GRCh37.p9</td><td>27,377</td><td>18,535</td><td>Apr 29, 2013</td></tr><tr><td>12</td><td>Homo sapiens</td><td>GRCh37.p10</td><td>27,655</td><td>18,607</td><td>Oct 24, 2013</td></tr><tr><td>13</td><td>Mus musculus</td><td>GRCm38.p1</td><td>23,010</td><td>19,990</td><td>Apr 7, 2014</td></tr><tr><td>14</td><td>Homo sapiens</td><td>GRCh37.p13</td><td>28,649</td><td>18,673</td><td>Nov 29, 2013</td></tr><tr><td>15</td><td>Homo sapiens</td><td>GRCh37.p13</td><td>28,897</td><td>18,681</td><td>Aug 7, 2014</td></tr><tr><td>16</td><td>Mus musculus</td><td>GRCm38.p2</td><td>23,835</td><td>20,079</td><td>Sep 10, 2014</td></tr><tr><td>17</td><td>Homo sapiens</td><td>GRCh38</td><td>30,461</td><td>18,800</td><td>Sep 10, 2014</td></tr><tr><td>18</td><td>Homo sapiens</td><td>GRCh38.p2</td><td>31,371</td><td>18,826</td><td>May 12, 2015</td></tr><tr><td>19</td><td>Mus musculus</td><td>GRCm38.p3</td><td>24,834</td><td>20,215</td><td>July 30, 2015</td></tr><tr><td>20</td><td>Homo sapiens</td><td>GRCh38.p7</td><td>32,524</td><td>18,892</td><td>Sep 8, 2016</td></tr><tr><td>21</td><td>Mus musculus</td><td>GRCm38.p4</td><td>25,757</td><td>20,354</td><td>Dec 8, 2016</td></tr><tr><td>22</td><td>Homo sapiens</td><td>GRCh38.p12</td><td>33,397</td><td>19,033</td><td>Jun 14, 2018</td></tr><tr><td>23</td><td>Mus musculus</td><td>GRCm38.p6</td><td>27,219</td><td>20,486</td><td>Oct 24, 2019</td></tr><tr><td>24</td><td>Homo sapiens</td><td>GRCh38.p14</td><td>35,608</td><td>19,107</td><td>Oct 26, 2022</td></tr></tbody></table>
The complete set of release statistics can be found at the official CCDS website on their <a href="https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=SHOW_STATISTICS#Current_Homo_sapiens_1">Releases & Statistics</a> page.

<h2 id="future-prospects">Future prospects</h2>
Long-term goals include the addition of attributes that indicate where transcript annotation is also identical (including the <a href="/facts/Untranslated_region/Tfk4gD18">UTRs</a>) and to indicate splice variants with different <a href="/facts/Untranslated_region/Tfk4gD18">UTRs</a> that have the same CCDS ID. It is also anticipated that as more complete and high-quality genome sequence data become available for other organisms, annotations from these organisms may be in scope for CCDS representation.
The CCDS set will become more complete as the independent curation groups agree on cases where they initially differ, as additional experimental validation of weakly supported genes occurs, and as automatic annotation methods continue to improve. Communication among the CCDS collaborating groups is ongoing and will resolve differences and identify refinements between CCDS update cycles. Human updates are expected to occur roughly every 6 months and mouse releases yearly.<a class="footnote-ref" id="fnref:31" href="#fn:31">31</a>

<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/GENCODE/feuygMtY">GENCODE</a></li>
<li><a href="/facts/Human_Genome/VTV85pbn">Human Genome</a></li>
<li><a href="/facts/Mouse_Genome_Informatics/cRF3seoe">Mouse Genome Informatics</a></li>
<li><a href="/facts/RefSeq/5kBXXR8s">RefSeq</a></li>
<li><a href="/facts/Ensembl/dEhjvNaA">Ensembl</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li><a href="https://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi">CCDS home page</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1">Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. <a href="/wiki/Kim_D._Pruitt" target="_blank">/wiki/Kim_D._Pruitt</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></li>
<li id="fn:2">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></li>
<li id="fn:3">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></li>
<li id="fn:4">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></li>
<li id="fn:5">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></li>
<li id="fn:6">Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. <a href="/wiki/Kim_D._Pruitt" target="_blank">/wiki/Kim_D._Pruitt</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></li>
<li id="fn:7">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></li>
<li id="fn:8">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></li>
<li id="fn:9">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></li>
<li id="fn:10">Alberts, B; Johnson, A; Lewis, J; Raff, M; Roberts, K; Walter, P (2002). Molecular Biology of the Cell 5th edn. New York: Garland Science. <a href="#fnref:10" class="footnote-back-ref">↩</a></li>
<li id="fn:11">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:11" class="footnote-back-ref">↩</a></li>
<li id="fn:12">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:12" class="footnote-back-ref">↩</a></li>
<li id="fn:13">Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118</a> <a href="#fnref:13" class="footnote-back-ref">↩</a></li>
<li id="fn:14">Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118</a> <a href="#fnref:14" class="footnote-back-ref">↩</a></li>
<li id="fn:15">Ingolia, NT; Brar, GA; Rouskin, S; McGeachy, AM; Weissman, JS (2014). "Genome-wide Annotation and Quantitation of Translation by Ribosome Profiling". Curr. Protoc. Mol. Biol. Chapter 4: 4.18.1–4.18.19. doi:10.1002/0471142727.mb0418s103. ISBN 9780471142720. PMC 3775365. PMID 23821443. <a href="9780471142720" target="_blank">9780471142720</a> <a href="#fnref:15" class="footnote-back-ref">↩</a></li>
<li id="fn:16">Calvo, SE; Pagliarni, DJ; Mootha, VK (2009). "Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" (PDF). Proc. Natl. Acad. Sci. U.S.A. 106 (18): 7507–12. Bibcode:2009PNAS..106.7507C. doi:10.1073/pnas.0810916106. PMC 2669787. PMID 19372376. <a href="http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf" target="_blank">http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf</a> <a href="#fnref:16" class="footnote-back-ref">↩</a></li>
<li id="fn:17">Kozak, M (2002). "Pushing the limits of the scanning mechanism for initiation of translation". Gene. 299 (1–2): 1–34. doi:10.1016/S0378-1119(02)01056-9. PMC 7126118. PMID 12459250. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7126118</a> <a href="#fnref:17" class="footnote-back-ref">↩</a></li>
<li id="fn:18">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:18" class="footnote-back-ref">↩</a></li>
<li id="fn:19">Silva, AL; Pereira, FJC; Morgado, A; Kong, J; Martins, R; Faustino, P; Liebhaber, SA; Romao, L (2006). "The canonical UPF1-dependent nonsense-mediated mRNA decay is inhibited in transcripts carrying a short open reading frame independent of sequence context". RNA. 12 (12): 2160–70. doi:10.1261/rna.201406. PMC 1664719. PMID 17077274. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1664719" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1664719</a> <a href="#fnref:19" class="footnote-back-ref">↩</a></li>
<li id="fn:20">Alberts, B; Johnson, A; Lewis, J; Raff, M; Roberts, K; Walter, P (2002). Molecular Biology of the Cell 5th edn. New York: Garland Science. <a href="#fnref:20" class="footnote-back-ref">↩</a></li>
<li id="fn:21">Calvo, SE; Pagliarni, DJ; Mootha, VK (2009). "Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans" (PDF). Proc. Natl. Acad. Sci. U.S.A. 106 (18): 7507–12. Bibcode:2009PNAS..106.7507C. doi:10.1073/pnas.0810916106. PMC 2669787. PMID 19372376. <a href="http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf" target="_blank">http://dspace.mit.edu/bitstream/1721.1/50259/1/Calvo-2009-Upstream%20open%20readin.pdf</a> <a href="#fnref:21" class="footnote-back-ref">↩</a></li>
<li id="fn:22">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:22" class="footnote-back-ref">↩</a></li>
<li id="fn:23">Prakash, Tulika; Sharma, Vineet K.; Adati, Naoki; Ozawa, Ritsuko; Kumar, Naveen; Nishida, Yuichiro; Fujikake, Takayoshi; Takeda, Tadayuki; Taylor, Todd D.; Michalak, Pawel (12 October 2010). "Expression of Conjoined Genes: Another Mechanism for Gene Regulation in Eukaryotes". PLOS ONE. 5 (10): e13284. Bibcode:2010PLoSO...513284P. doi:10.1371/journal.pone.0013284. PMC 2953495. PMID 20967262. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2953495" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2953495</a> <a href="#fnref:23" class="footnote-back-ref">↩</a></li>
<li id="fn:24">Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS project". Database. 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308164</a> <a href="#fnref:24" class="footnote-back-ref">↩</a></li>
<li id="fn:25">Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. <a href="/wiki/Kim_D._Pruitt" target="_blank">/wiki/Kim_D._Pruitt</a> <a href="#fnref:25" class="footnote-back-ref">↩</a></li>
<li id="fn:26">Maglott, D.; Ostell, J.; Pruitt, K. D.; Tatusova, T. (28 November 2010). "Entrez Gene: gene-centered information at NCBI". Nucleic Acids Res. 39 (Database): D52 – D57. doi:10.1093/nar/gkq1237. PMC 3013746. PMID 21115458. <a href="/wiki/Donna_R._Maglott" target="_blank">/wiki/Donna_R._Maglott</a> <a href="#fnref:26" class="footnote-back-ref">↩</a></li>
<li id="fn:27">Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes". Genome Res. 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. <a href="/wiki/Kim_D._Pruitt" target="_blank">/wiki/Kim_D._Pruitt</a> <a href="#fnref:27" class="footnote-back-ref">↩</a></li>
<li id="fn:28">Harrow, J.; Frankish, A.; Gonzalez, J. M.; Tapanari, E.; Diekhans, M.; Kokocinski, F.; Aken, B. L.; Barrell, D.; Zadissa, A.; Searle, S.; Barnes, I.; Bignell, A.; Boychenko, V.; Hunt, T.; Kay, M.; Mukherjee, G.; Rajan, J.; Despacio-Reyes, G.; Saunders, G.; Steward, C.; Harte, R.; Lin, M.; Howald, C.; Tanzer, A.; Derrien, T.; Chrast, J.; Walters, N.; Balasubramanian, S.; Pei, B.; Tress, M.; Rodriguez, J. M.; Ezkurdia, I.; van Baren, J.; Brent, M.; Haussler, D.; Kellis, M.; Valencia, A.; Reymond, A.; Gerstein, M.; Guigo, R.; Hubbard, T. J. (5 September 2012). "GENCODE: The reference human genome annotation for The ENCODE Project". Genome Res. 22 (9): 1760–1774. doi:10.1101/gr.135350.111. PMC 3431492. PMID 22955987. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431492" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3431492</a> <a href="#fnref:28" class="footnote-back-ref">↩</a></li>
<li id="fn:29">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:29" class="footnote-back-ref">↩</a></li>
<li id="fn:30">Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard (2011). "A comparative analysis of exome capture". Genome Biol. 12 (9): R97. doi:10.1186/gb-2011-12-9-r97. PMC 3308060. PMID 21958622. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308060" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3308060</a> <a href="#fnref:30" class="footnote-back-ref">↩</a></li>
<li id="fn:31">Farrell, CM; O'Leary, NA; Harte, RA; Loveland, JE; Wilming, LG; Wallin, C; Diehans, M; Barrell, D; Searle, SM; Aken, B; Hiatt, SM; Frankish, A; Suner, MM; Rajput, B; Steward, CA; Brown, GR; Bennet, R; Murphy, M; Wu, W; Kay, MP; Hart, J; Rajan, J; Weber, J; Snow, C; Riddick, LD; Hunt, T; Webb, D; Thomas, M; Tamez, P; Rangwala, SH; McGarvey, KM; Pujar, S; Shkeda, A; Mudge, JM; Gonzale, JM; Gilbert, JG; Trevaion, SJ; Baetsch, R; Harrow, JL; Hubbard, T; Ostell, JM; Haussler, D; Pruitt, KD (2014). "Current status and new features of the Consensus Coding Sequence database". Nucleic Acids Res. 42 (D1): D865 – D872. doi:10.1093/nar/gkt1059. PMC 3965069. PMID 24217909. <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069" target="_blank">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3965069</a> <a href="#fnref:31" class="footnote-back-ref">↩</a></li>
</ol>

Consensus CDS Project open-in-new

Consensus CDS Project