Error threshold (evolution)

<h2 id="fitness-landscape">Fitness landscape</h2>
<p class="note">Main article: <a href="/facts/Fitness_landscape/XrqwdjWR">Fitness landscape</a></p>
<p>It was noted by <a href="/facts/Manfred_Eigen/Chgus0Dn">Manfred Eigen</a> in his 1971 paper (Eigen 1971) that this mutation process places a limit on the number of digits a molecule may have. If a molecule exceeds this critical size, the effect of the mutations becomes overwhelming and a runaway mutation process will destroy the information in subsequent generations of the molecule. The error threshold is also controlled by the "fitness landscape" for the molecules. The fitness landscape is characterized by the two concepts of height (=fitness) and distance (=number of mutations). Similar molecules are "close" to each other, and molecules that are fitter than others and more likely to reproduce, are "higher" in the landscape.
</p><p>If a particular sequence and its neighbors have a high fitness, they will form a <a href="/facts/Quasispecies_model/3GA85Ntf">quasispecies</a> and will be able to support longer sequence lengths than a fit sequence with few fit neighbors, or a less fit neighborhood of sequences. Also, it was noted by Wilke (Wilke 2005) that the error threshold concept does not apply in portions of the  landscape where there are lethal mutations, in which the induced mutation yields zero fitness and prohibits the molecule from reproducing.
</p>
<h2 id="eigens-paradox">Eigen's paradox</h2>
<p>Eigen's paradox is one of the most intractable puzzles in the study of the origins of life. It is thought that the error threshold concept described above limits the size of self replicating molecules to perhaps a few hundred digits,  yet almost all life on earth requires  much longer molecules to encode their genetic information. This problem is handled in living cells by enzymes that repair mutations, allowing the encoding molecules to reach sizes on the order of millions of base pairs. These large molecules must, of course, encode the very enzymes that repair them, and herein lies Eigen's paradox, first put forth by <a href="/facts/Manfred_Eigen/Chgus0Dn">Manfred Eigen</a> in his 1971 paper (Eigen 1971).<a class="footnote-ref" id="fnref:1" href="#fn:1"><sup>1</sup></a> Simply stated, Eigen's paradox amounts to the following:
</p>
<ul><li>Without error correction enzymes, the maximum size of a replicating molecule is about 100 base pairs.</li>
<li>For a replicating molecule to encode error correction enzymes, it must be substantially larger than 100 bases.</li></ul>
<p>This is a <a href="/facts/The_chicken_or_the_egg/quA219Aw">chicken-or-egg</a> kind of a paradox, with an even more difficult solution. Which came first, the large genome or the error correction enzymes? A number of solutions to this paradox have been proposed:
</p>
<ul><li>Stochastic corrector model (Szathmáry & Maynard Smith, 1995). In this proposed solution, a number of primitive molecules of say, two different types, are associated with each other in some way, perhaps by a capsule or "cell wall". If their reproductive success is enhanced by having, say, equal numbers in each cell, and reproduction occurs by division in which each of various types of molecules are randomly distributed among the "children", the process of selection will promote such equal representation in the cells, even though one of the molecules may have a selective advantage over the other.</li>
<li>Relaxed error threshold (Kun et al., 2005) - Studies of actual ribozymes indicate that the mutation rate can be substantially less than first expected - on the order of 0.001 per base pair per replication. This may allow sequence lengths of the order of 7-8 thousand base pairs, sufficient to incorporate rudimentary error correction enzymes.</li></ul>
<h2 id="a-simple-mathematical-model">A simple mathematical model</h2>
<p>Consider a 3-digit molecule [A,B,C] where A, B, and C can take on the values 0 and 1. There are eight such sequences ([000], [001], [010], [011], [100], [101], [110], and [111]). Let's say that the [000] molecule is the most fit; upon each replication it produces an average of 
  
    
      
        a
      
    
    {\displaystyle a}
  
 copies, where 
  
    
      
        a
        >
        1
      
    
    {\displaystyle a>1}
  
. This molecule is called the "master sequence". The other seven sequences are less fit; they each produce only 1 copy per replication. The replication of each of the three digits is done with a mutation rate of μ. In other words, at every replication of a digit of a sequence, there is a probability 
  
    
      
        μ
      
    
    {\displaystyle \mu }
  
 that it will be erroneous; 0 will be replaced by 1 or vice versa. Let's ignore double mutations and the death of molecules (the population will grow infinitely), and divide the eight molecules into three classes depending on their <a href="/facts/Hamming_distance/MgdtbYPo">Hamming distance</a> from the master sequence:
</p>
<table><tbody><tr><td>Hammingdistance</td><td>Sequence(s)</td></tr><tr><td>0</td><td>[000]</td></tr><tr><td>1</td><td>[001][010][100]</td></tr><tr><td>2</td><td>[110][101][011]</td></tr><tr><td>3</td><td>[111]</td></tr></tbody></table>
<p>Note that the number of sequences for distance <i>d</i> is just the <a href="/facts/Binomial_coefficient/Tsx7eY5h">binomial coefficient</a> 
  
    
      
        
          
            
              
                (
              
              
                L
                d
              
              
                )
              
            
          
        
      
    
    {\displaystyle {\tbinom {L}{d}}}
  
 for L=3, and that each sequence can be visualized as the vertex of an L=3  dimensional cube,  with each edge of the cube specifying a mutation path in which the change Hamming distance is either zero or ±1. It can be seen that, for example, one third of the mutations of the [001] molecules will produce [000] molecules, while the other two thirds will produce the class 2 molecules [011] and [101]. We can now write the expression for the child populations 
  
    
      
        
          n
          
            i
          
          ′
        
      
    
    {\displaystyle n'_{i}}
  
 of class <i>i</i> in terms of the parent populations 
  
    
      
        
          n
          
            j
          
        
      
    
    {\displaystyle n_{j}}
  
.
</p>

n
          
            i
          
          ′
        
        =
        
          ∑
          
            j
            =
            0
          
          
            3
          
        
        
          w
          
            i
            j
          
        
        
          n
          
            j
          
        
      
    
    {\displaystyle n'_{i}=\sum _{j=0}^{3}w_{ij}n_{j}}

<p>where the matrix '<i>w</i>’ that incorporates natural selection and mutation, according to <a href="/facts/Quasispecies_model/3GA85Ntf">quasispecies model</a>, is given by:
</p>

w
        
        =
        
          
            [
            
              
                
                  a
                  ⋅
                  Q
                
                
                  3
                  a
                  ⋅
                  μ
                
                
                  0
                
                
                  0
                
              
              
                
                  μ
                
                
                  Q
                
                
                  2
                  μ
                
                
                  0
                
              
              
                
                  0
                
                
                  2
                  μ
                
                
                  Q
                
                
                  μ
                
              
              
                
                  0
                
                
                  0
                
                
                  3
                  μ
                
                
                  Q
                
              
            
            ]
          
        
      
    
    {\displaystyle \mathbf {w} ={\begin{bmatrix}a\cdot Q&3a\cdot \mu &0&0\\\mu &Q&2\mu &0\\0&2\mu &Q&\mu \\0&0&3\mu &Q\end{bmatrix}}}

<p>where 
  
    
      
        Q
        =
        (
        1
        −
        μ
        
          )
          
            L
          
        
      
    
    {\displaystyle Q=(1-\mu )^{L}}
  
 is the probability that an entire molecule will be replicated successfully. The <a href="/facts/Eigenvectors/8TjEoT8u">eigenvectors</a> of the w matrix will yield the equilibrium population numbers for each class. For example, if the mutation rate μ is zero, we will have Q=1, and the equilibrium concentrations will be 
  
    
      
        [
        
          n
          
            0
          
        
        ,
        
          n
          
            1
          
        
        ,
        
          n
          
            2
          
        
        ,
        
          n
          
            3
          
        
        ]
        =
        [
        1
        ,
        0
        ,
        0
        ,
        0
        ]
      
    
    {\displaystyle [n_{0},n_{1},n_{2},n_{3}]=[1,0,0,0]}
  
. The master sequence, being the fittest will be the only one to survive. If we have a replication fidelity of Q=0.95 and genetic advantage of a=1.05, then the equilibrium concentrations will be roughly 
  
    
      
        [
        0.33
        ,
        0.38
        ,
        0.24
        ,
        0.06
        ]
      
    
    {\displaystyle [0.33,0.38,0.24,0.06]}
  
. It can be seen that the master sequence is not as dominant; nevertheless, sequences with low Hamming distance are in majority. If we have a replication fidelity of Q approaching 0, then the equilibrium concentrations will be roughly 
  
    
      
        [
        0.125
        ,
        0.375
        ,
        0.375
        ,
        0.125
        ]
      
    
    {\displaystyle [0.125,0.375,0.375,0.125]}
  
. This is a population with equal number of each of 8 sequences. (If we had perfectly equal population of all sequences, we would have populations of [1,3,3,1]/8.)
</p><p>If we now go to the case where the number of base pairs is large, say L=100, we obtain behavior that resembles a <a href="/facts/Phase_transition/eLP2BY8R">phase transition</a>. The plot below on the left shows a series of equilibrium concentrations divided by the binomial coefficient 
  
    
      
        
          
            
              
                (
              
              
                100
                k
              
              
                )
              
            
          
        
      
    
    {\displaystyle {\tbinom {100}{k}}}
  
.
(This multiplication will show the population for an individual sequence at that distance,  and will yield a flat line for an equal distribution.) The selective advantage of the master sequence is set at a=1.05.  The horizontal axis is the Hamming distance <i>d</i> . The various curves are for various total  mutation rates 
  
    
      
        (
        1
        −
        Q
        )
      
    
    {\displaystyle (1-Q)}
  
. It is seen that for low values of the total mutation rate, the population consists of a <a href="/facts/Quasispecies/3GA85Ntf">quasispecies</a> gathered in the neighborhood of the master sequence. Above a total mutation rate of about 1-Q=0.05, the distribution quickly spreads out to populate all sequences equally. The plot below on the right shows the fractional population of the master sequence as a function of the total mutation rate. Again it is seen that below a critical mutation rate of about 1-Q=0.05, the master sequence contains most of the population, while above this rate, it contains only about 
  
    
      
        
          2
          
            −
            L
          
        
        ≈
        
          10
          
            −
            30
          
        
      
    
    {\displaystyle 2^{-L}\approx 10^{-30}}
  
 of the total population.
</p>

<p>It can be seen that there is a sharp transition at a value of <i>1-Q</i>  just a bit larger than 0.05. For mutation rates above this value, the population of the master sequence drops to practically zero. Above this value, it dominates.
</p><p>In the limit as <i>L</i> approaches infinity, the system does in fact have a phase transition at a critical value of Q: 
  
    
      
        
          Q
          
            c
          
        
        =
        1
        
          /
        
        a
        .
      
    
    {\displaystyle Q_{c}=1/a.}
  
. One could think of the overall mutation rate (1-Q) as a sort of "temperature", which "melts" the fidelity of the molecular sequences above the critical "temperature" of 
  
    
      
        1
        −
        
          Q
          
            c
          
        
      
    
    {\displaystyle 1-Q_{c}}
  
. For faithful replication to occur, the information must be "frozen" into the genome.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Error_catastrophe/N35resFd">Error catastrophe</a></li>
<li><a href="/facts/Extinction_vortex/tn22GRER">Extinction vortex</a></li>
<li><a href="/facts/Genetic_entropy/Fj2xzdzq">Genetic entropy</a></li>
<li><a href="/facts/Genetic_erosion/wJXcHUXQ">Genetic erosion</a></li>
<li><a href="/facts/Muller%27s_ratchet/jyVKrEAt">Muller's ratchet</a></li></ul>

<ul><li>Eigen, M. (1971). "Selforganization of matter and evolution of biological Macromolecules". <i>Naturwissenschaften</i>. 58 (10): 465–523. <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/1971NW.....58..465E">1971NW.....58..465E</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1007%2FBF00623322">10.1007/BF00623322</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/4942363">4942363</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:38296619">38296619</a>.</li>
<li><a href="http://www.biomedcentral.com/content/pdf/1471-2148-5-44.pdf">"Quasispecies theory in the context of population genetics - Claus O. Wilke"</a> (PDF). Retrieved October 12, 2005.</li>
<li>Campos, P. R. A.; Fontanari, J. F. (1999). <a href="http://www.iop.org/EJ/article/0305-4470/32/1/001/a901l1.pdf">"Finite-size scaling of the error threshold transition in finite populations"</a> (PDF). <i>J. Phys. A: Math. Gen</i>. 32 (1): L1 – L7. <a href="/facts/ArXiv_(identifier)/H6EtgnBe">arXiv</a>:<a href="https://arxiv.org/abs/cond-mat/9809209">cond-mat/9809209</a>. <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/1999JPhA...32L...1C">1999JPhA...32L...1C</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1088%2F0305-4470%2F32%2F1%2F001">10.1088/0305-4470/32/1/001</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:16500591">16500591</a>.</li>
<li>Holmes, Edward C. (2005). <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7097767">"On being the right size"</a>. <i>Nature Genetics</i>. 37 (9): 923–924. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1038%2Fng0905-923">10.1038/ng0905-923</a>. <a href="/facts/PMC_(identifier)/dX1zMt71">PMC</a> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7097767">7097767</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/16132047">16132047</a>.</li>
<li>Eörs Szathmáry; John Maynard Smith (1995). "The major evolutionary transitions". <i>Nature</i>. 374 (6519): 227–232. <a href="/facts/Bibcode_(identifier)/9HtdQSGB">Bibcode</a>:<a href="https://ui.adsabs.harvard.edu/abs/1995Natur.374..227S">1995Natur.374..227S</a>. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1038%2F374227a0">10.1038/374227a0</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/7885442">7885442</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:4315120">4315120</a>.</li>
<li>Luis Villarreal; Guenther Witzany (2013). <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3856310">"Rethinking quasispecies theory: From fittest type to cooperative consortia"</a>. <i>World Journal of Biological Chemistry</i>. 4 (4): 79–90. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.4331%2Fwjbc.v4.i4.79">10.4331/wjbc.v4.i4.79</a>. <a href="/facts/PMC_(identifier)/dX1zMt71">PMC</a> <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3856310">3856310</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/24340131">24340131</a>.</li>
<li>Ádám Kun; Mauro Santos; Eörs Szathmáry (2005). "Real ribozymes suggest a relaxed error threshold". <i>Nature Genetics</i>. 37 (9): 1008–1011. <a href="/facts/Doi_(identifier)/muM9Etpq">doi</a>:<a href="https://doi.org/10.1038%2Fng1621">10.1038/ng1621</a>. <a href="/facts/PMID_(identifier)/JlHAvMHt">PMID</a> <a href="https://pubmed.ncbi.nlm.nih.gov/16127452">16127452</a>. <a href="/facts/S2CID_(identifier)/ldJsHa2Y">S2CID</a> <a href="https://api.semanticscholar.org/CorpusID:30582475">30582475</a>.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Holmes, Edward C. (2009). The Evolution and Emergence of RNA Viruses. Oxford University Press. pp. 22, 23, 48. ISBN 9780199211128. Retrieved 1 February 2019. <a href="9780199211128" target="_blank">9780199211128</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
</ol>

Error threshold (evolution) open-in-new

Error threshold (evolution)