Binary symmetric channel

<h2 id="definition">Definition</h2>

<p>A binary symmetric channel with crossover probability 
  
    
      
        p
      
    
    {\displaystyle p}
  
, denoted by BSCp, is a channel with binary input and binary output and probability of error 
  
    
      
        p
      
    
    {\displaystyle p}
  
. That is, if 
  
    
      
        X
      
    
    {\displaystyle X}
  
 is the transmitted <a href="/facts/Random_variable/TwTBXnLT">random variable</a> and 
  
    
      
        Y
      
    
    {\displaystyle Y}
  
 the received variable, then the channel is characterized by the <a href="/facts/Conditional_probability/QcN2UERV">conditional probabilities</a>:<a class="footnote-ref" id="fnref:1" href="#fn:1"><sup>1</sup></a>
</p>

Pr
                ⁡
                [
                Y
                =
                0
                
                  |
                
                X
                =
                0
                ]
              
              
                
                =
                1
                −
                p
              
            
            
              
                Pr
                ⁡
                [
                Y
                =
                0
                
                  |
                
                X
                =
                1
                ]
              
              
                
                =
                p
              
            
            
              
                Pr
                ⁡
                [
                Y
                =
                1
                
                  |
                
                X
                =
                0
                ]
              
              
                
                =
                p
              
            
            
              
                Pr
                ⁡
                [
                Y
                =
                1
                
                  |
                
                X
                =
                1
                ]
              
              
                
                =
                1
                −
                p
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}\operatorname {Pr} [Y=0|X=0]&=1-p\\\operatorname {Pr} [Y=0|X=1]&=p\\\operatorname {Pr} [Y=1|X=0]&=p\\\operatorname {Pr} [Y=1|X=1]&=1-p\end{aligned}}}

<p>It is assumed that 
  
    
      
        0
        ≤
        p
        ≤
        1
        
          /
        
        2
      
    
    {\displaystyle 0\leq p\leq 1/2}
  
. If 
  
    
      
        p
        >
        1
        
          /
        
        2
      
    
    {\displaystyle p>1/2}
  
, then the receiver can swap the output (interpret 1 when it sees 0, and vice versa) and obtain an equivalent channel with crossover probability 
  
    
      
        1
        −
        p
        ≤
        1
        
          /
        
        2
      
    
    {\displaystyle 1-p\leq 1/2}
  
.
</p>
<h2 id="capacity">Capacity</h2>

<p>The <a href="/facts/Channel_capacity/55wJYsS6">channel capacity</a> of the binary symmetric channel, in <a href="/facts/Bit/RwfjaYSv">bits</a>, is:<a class="footnote-ref" id="fnref:2" href="#fn:2"><sup>2</sup></a>
</p>

C
          
            BSC
          
        
        =
        1
        −
        
          H
          
            b
          
        
        ⁡
        (
        p
        )
        ,
      
    
    {\displaystyle \ C_{\text{BSC}}=1-\operatorname {H} _{\text{b}}(p),}

<p>where 
  
    
      
        
          H
          
            b
          
        
        ⁡
        (
        p
        )
      
    
    {\displaystyle \operatorname {H} _{\text{b}}(p)}
  
 is the <a href="/facts/Binary_entropy_function/s9cwzz4n">binary entropy function</a>, defined by:<a class="footnote-ref" id="fnref:3" href="#fn:3"><sup>3</sup></a>
</p>

H
          
            b
          
        
        ⁡
        (
        x
        )
        =
        x
        
          log
          
            2
          
        
        ⁡
        
          
            1
            x
          
        
        +
        (
        1
        −
        x
        )
        
          log
          
            2
          
        
        ⁡
        
          
            1
            
              1
              −
              x
            
          
        
      
    
    {\displaystyle \operatorname {H} _{\text{b}}(x)=x\log _{2}{\frac {1}{x}}+(1-x)\log _{2}{\frac {1}{1-x}}}

<h2 id="noisy-channel-coding-theorem">Noisy-channel coding theorem</h2>
<p>Shannon's <a href="/facts/Noisy-channel_coding_theorem/vvr8u4Gg">noisy-channel coding theorem</a> gives a result about the rate of information that can be transmitted through a communication channel with arbitrarily low error. We study the particular case of 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
.
</p><p>The noise 
  
    
      
        e
      
    
    {\displaystyle e}
  
 that characterizes 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 is a <a href="/facts/Random_variable/TwTBXnLT">random variable</a> consisting of n independent random bits (n is defined below) where each random bit is a 
  
    
      
        1
      
    
    {\displaystyle 1}
  
 with probability 
  
    
      
        p
      
    
    {\displaystyle p}
  
 and a 
  
    
      
        0
      
    
    {\displaystyle 0}
  
 with probability 
  
    
      
        1
        −
        p
      
    
    {\displaystyle 1-p}
  
. We indicate this by writing "
  
    
      
        e
        ∈
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle e\in {\text{BSC}}_{p}}
  
".
</p>

<p>Theorem—For all 
  
    
      
        p
        <
        
          
            
              1
              2
            
          
        
        ,
      
    
    {\displaystyle p<{\tfrac {1}{2}},}
  
 all 
  
    
      
        0
        <
        ϵ
        <
        
          
            
              1
              2
            
          
        
        −
        p
      
    
    {\displaystyle 0<\epsilon <{\tfrac {1}{2}}-p}
  
, all sufficiently large 
  
    
      
        n
      
    
    {\displaystyle n}
  
 (depending on 
  
    
      
        p
      
    
    {\displaystyle p}
  
 and 
  
    
      
        ϵ
      
    
    {\displaystyle \epsilon }
  
), and all 
  
    
      
        k
        ≤
        ⌊
        (
        1
        −
        H
        (
        p
        +
        ϵ
        )
        )
        n
        ⌋
      
    
    {\displaystyle k\leq \lfloor (1-H(p+\epsilon ))n\rfloor }
  
, there exists a pair of encoding and decoding functions 
  
    
      
        E
        :
        {
        0
        ,
        1
        
          }
          
            k
          
        
        →
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle E:\{0,1\}^{k}\to \{0,1\}^{n}}
  
 and 
  
    
      
        D
        :
        {
        0
        ,
        1
        
          }
          
            n
          
        
        →
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle D:\{0,1\}^{n}\to \{0,1\}^{k}}
  
 respectively,  such that every message 
  
    
      
        m
        ∈
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle m\in \{0,1\}^{k}}
  
 has the following property: 
</p>

Pr
          
            e
            ∈
            
              
                BSC
              
              
                p
              
            
          
        
        [
        D
        (
        E
        (
        m
        )
        +
        e
        )
        ≠
        m
        ]
        ≤
        
          2
          
            −
            
              δ
            
            n
          
        
      
    
    {\displaystyle \Pr _{e\in {\text{BSC}}_{p}}[D(E(m)+e)\neq m]\leq 2^{-{\delta }n}}
  
.

<p>What this theorem actually implies is, a message when picked from 
  
    
      
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle \{0,1\}^{k}}
  
, encoded with a random encoding function 
  
    
      
        E
      
    
    {\displaystyle E}
  
, and sent across a noisy 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
, there is a very high probability of recovering the original message by decoding, if 
  
    
      
        k
      
    
    {\displaystyle k}
  
 or in effect the rate of the channel is bounded by the quantity stated in the theorem. The decoding error probability is exponentially small.
</p>
<h3>Proof</h3>
<p>The theorem can be proved directly with a <a href="/facts/Probabilistic_method/AXKXrzrr">probabilistic method</a>. Consider an encoding function 
  
    
      
        E
        :
        {
        0
        ,
        1
        
          }
          
            k
          
        
        →
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle E:\{0,1\}^{k}\to \{0,1\}^{n}}
  
 that is selected at random. This means that for each message 
  
    
      
        m
        ∈
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle m\in \{0,1\}^{k}}
  
, the value 
  
    
      
        E
        (
        m
        )
        ∈
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle E(m)\in \{0,1\}^{n}}
  
 is selected at random (with equal probabilities). For a given encoding function 
  
    
      
        E
      
    
    {\displaystyle E}
  
, the decoding function 
  
    
      
        D
        :
        {
        0
        ,
        1
        
          }
          
            n
          
        
        →
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle D:\{0,1\}^{n}\to \{0,1\}^{k}}
  
 is specified as follows: given any received codeword 
  
    
      
        y
        ∈
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle y\in \{0,1\}^{n}}
  
, we find the message 
  
    
      
        m
        ∈
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle m\in \{0,1\}^{k}}
  
 such that the <a href="/facts/Hamming_distance/MgdtbYPo">Hamming distance</a> 
  
    
      
        Δ
        (
        y
        ,
        E
        (
        m
        )
        )
      
    
    {\displaystyle \Delta (y,E(m))}
  
 is as small as possible (with ties broken arbitrarily). (
  
    
      
        D
      
    
    {\displaystyle D}
  
 is called a <a href="/facts/Decoding_methods/A0x9Dpgq">maximum likelihood decoding</a> function.)
</p><p>The proof continues by showing that at least one such choice 
  
    
      
        (
        E
        ,
        D
        )
      
    
    {\displaystyle (E,D)}
  
 satisfies the conclusion of theorem, by integration over the probabilities. Suppose 
  
    
      
        p
      
    
    {\displaystyle p}
  
 and 
  
    
      
        ϵ
      
    
    {\displaystyle \epsilon }
  
 are fixed. First we show that, for a fixed 
  
    
      
        m
        ∈
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle m\in \{0,1\}^{k}}
  
 and 
  
    
      
        E
      
    
    {\displaystyle E}
  
 chosen randomly, the probability of failure over 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 noise is exponentially small in <i>n</i>. At this point, the proof works for a fixed message 
  
    
      
        m
      
    
    {\displaystyle m}
  
. Next we extend this result to work for all messages 
  
    
      
        m
      
    
    {\displaystyle m}
  
. We achieve this by eliminating half of the codewords from the code with the argument that the proof for the decoding error probability holds for at least half of the codewords. The latter method is called expurgation. This gives the total process the name <i>random coding with expurgation</i>.
</p>

<h2 id="converse-of-shannons-capacity-theorem">Converse of Shannon's capacity theorem</h2>
<p>The converse of the capacity theorem essentially states that 
  
    
      
        1
        −
        H
        (
        p
        )
      
    
    {\displaystyle 1-H(p)}
  
 is the best rate one can achieve over a binary symmetric channel. Formally the theorem states:
</p>

<p>Theorem—If 
  
    
      
        k
      
    
    {\displaystyle k}
  
 
  
    
      
        ≥
      
    
    {\displaystyle \geq }
  
 
  
    
      
        ⌈
      
    
    {\displaystyle \lceil }
  
 
  
    
      
        (
        1
        −
        H
        (
        p
        +
        ϵ
        )
        n
        )
      
    
    {\displaystyle (1-H(p+\epsilon )n)}
  
 
  
    
      
        ⌉
      
    
    {\displaystyle \rceil }
  
 then the following is true for every <a href="/facts/Code/CSntvnEo">encoding</a> and <a href="/facts/Code/CSntvnEo">decoding</a> function 
  
    
      
        E
      
    
    {\displaystyle E}
  
: 
  
    
      
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle \{0,1\}^{k}}
  
 
  
    
      
        →
      
    
    {\displaystyle \rightarrow }
  
 
  
    
      
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle \{0,1\}^{n}}
  
 and 
  
    
      
        D
      
    
    {\displaystyle D}
  
: 
  
    
      
        {
        0
        ,
        1
        
          }
          
            n
          
        
      
    
    {\displaystyle \{0,1\}^{n}}
  
 
  
    
      
        →
      
    
    {\displaystyle \rightarrow }
  
 
  
    
      
        {
        0
        ,
        1
        
          }
          
            k
          
        
      
    
    {\displaystyle \{0,1\}^{k}}
  
 respectively: 
  
    
      
        
          Pr
          
            e
            ∈
            
              
                BSC
              
              
                p
              
            
          
        
      
    
    {\displaystyle \Pr _{e\in {\text{BSC}}_{p}}}
  
[
  
    
      
        D
        (
        E
        (
        m
        )
        +
        e
        )
      
    
    {\displaystyle D(E(m)+e)}
  
 
  
    
      
        ≠
      
    
    {\displaystyle \neq }
  
 
  
    
      
        m
        ]
      
    
    {\displaystyle m]}
  
 
  
    
      
        ≥
      
    
    {\displaystyle \geq }
  
 
  
    
      
        
          
            1
            2
          
        
      
    
    {\displaystyle {\frac {1}{2}}}
  
.
</p>

<p>The intuition behind the proof is however showing the number of errors to grow rapidly as the rate grows beyond the channel capacity. The idea is the sender generates messages of dimension 
  
    
      
        k
      
    
    {\displaystyle k}
  
, while the channel 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 introduces transmission errors. When the capacity of the channel is 
  
    
      
        H
        (
        p
        )
      
    
    {\displaystyle H(p)}
  
, the number of errors is typically 
  
    
      
        
          2
          
            H
            (
            p
            +
            ϵ
            )
            n
          
        
      
    
    {\displaystyle 2^{H(p+\epsilon )n}}
  
 for a code of block length 
  
    
      
        n
      
    
    {\displaystyle n}
  
. The maximum number of messages is 
  
    
      
        
          2
          
            k
          
        
      
    
    {\displaystyle 2^{k}}
  
. The output of the channel on the other hand has 
  
    
      
        
          2
          
            n
          
        
      
    
    {\displaystyle 2^{n}}
  
 possible values. If there is any confusion between any two messages, it is likely that 
  
    
      
        
          2
          
            k
          
        
        
          2
          
            H
            (
            p
            +
            ϵ
            )
            n
          
        
        ≥
        
          2
          
            n
          
        
      
    
    {\displaystyle 2^{k}2^{H(p+\epsilon )n}\geq 2^{n}}
  
. Hence we would have 
  
    
      
        k
        ≥
        ⌈
        (
        1
        −
        H
        (
        p
        +
        ϵ
        )
        n
        )
        ⌉
      
    
    {\displaystyle k\geq \lceil (1-H(p+\epsilon )n)\rceil }
  
, a case we would like to avoid to keep the decoding error probability exponentially small.
</p>
<h2 id="codes">Codes</h2>
<p>Very recently, a lot of work has been done and is also being done to design explicit error-correcting codes to achieve the capacities of several standard communication channels. The motivation behind designing such codes is to relate the rate of the code with the fraction of errors which it can correct.
</p><p>The approach behind the design of codes which meet the channel capacities of 
  
    
      
        
          BSC
        
      
    
    {\displaystyle {\text{BSC}}}
  
 or the <a href="/facts/Binary_erasure_channel/qJnFRvl9">binary erasure channel</a> 
  
    
      
        
          BEC
        
      
    
    {\displaystyle {\text{BEC}}}
  
 have been to correct a lesser number of errors with a high probability, and to achieve the highest possible rate. Shannon's theorem gives us the best rate which could be achieved over a 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
, but it does not give us an idea of any explicit codes which achieve that rate. In fact such codes are typically constructed to correct only a small fraction of errors with a high probability, but achieve a very good rate. The first such code was due to George D. Forney in 1966. The code is a concatenated code by concatenating two different kinds of codes.
</p>
<h3>Forney's code</h3>
<p>Forney constructed a <a href="/facts/Concatenated_code/kMtLmN80">concatenated code</a> 
  
    
      
        
          C
          
            ∗
          
        
        =
        
          C
          
            out
          
        
        ∘
        
          C
          
            in
          
        
      
    
    {\displaystyle C^{*}=C_{\text{out}}\circ C_{\text{in}}}
  
 to achieve the capacity of the noisy-channel coding theorem for 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
. In his code,
</p>
<ul><li>The outer code 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
 is a code of block length 
  
    
      
        N
      
    
    {\displaystyle N}
  
 and rate 
  
    
      
        1
        −
        
          
            ϵ
            2
          
        
      
    
    {\displaystyle 1-{\frac {\epsilon }{2}}}
  
 over the field 
  
    
      
        
          F
          
            
              2
              
                k
              
            
          
        
      
    
    {\displaystyle F_{2^{k}}}
  
, and 
  
    
      
        k
        =
        O
        (
        log
        ⁡
        N
        )
      
    
    {\displaystyle k=O(\log N)}
  
. Additionally, we have a <a href="/facts/Code/CSntvnEo">decoding</a> algorithm 
  
    
      
        
          D
          
            out
          
        
      
    
    {\displaystyle D_{\text{out}}}
  
 for 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
 which can correct up to 
  
    
      
        γ
      
    
    {\displaystyle \gamma }
  
 fraction of worst case errors and runs in 
  
    
      
        
          t
          
            out
          
        
        (
        N
        )
      
    
    {\displaystyle t_{\text{out}}(N)}
  
 time.</li>
<li>The inner code 
  
    
      
        
          C
          
            in
          
        
      
    
    {\displaystyle C_{\text{in}}}
  
 is a code of block length 
  
    
      
        n
      
    
    {\displaystyle n}
  
, dimension 
  
    
      
        k
      
    
    {\displaystyle k}
  
, and a rate of 
  
    
      
        1
        −
        H
        (
        p
        )
        −
        
          
            ϵ
            2
          
        
      
    
    {\displaystyle 1-H(p)-{\frac {\epsilon }{2}}}
  
. Additionally, we have a decoding algorithm 
  
    
      
        
          D
          
            in
          
        
      
    
    {\displaystyle D_{\text{in}}}
  
 for 
  
    
      
        
          C
          
            in
          
        
      
    
    {\displaystyle C_{\text{in}}}
  
 with a <a href="/facts/Code/CSntvnEo">decoding</a> error probability of at most 
  
    
      
        
          
            γ
            2
          
        
      
    
    {\displaystyle {\frac {\gamma }{2}}}
  
 over 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 and runs in 
  
    
      
        
          t
          
            in
          
        
        (
        N
        )
      
    
    {\displaystyle t_{\text{in}}(N)}
  
 time.</li></ul>
<p>For the outer code 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
, a Reed-Solomon code would have been the first code to have come in mind. However, we would see that the construction of such a code cannot be done in <a href="/facts/Time_complexity/77T62gmf">polynomial time</a>. This is why a <a href="/facts/Binary_linear_code/HoFI3eyF">binary linear code</a> is used for 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
.
</p><p>For the inner code 
  
    
      
        
          C
          
            in
          
        
      
    
    {\displaystyle C_{\text{in}}}
  
 we find a <a href="/facts/Linear_code/HoFI3eyF">linear code</a> by exhaustively searching from the <a href="/facts/Linear_code/HoFI3eyF">linear code</a>  of block length 
  
    
      
        n
      
    
    {\displaystyle n}
  
 and dimension 
  
    
      
        k
      
    
    {\displaystyle k}
  
, whose rate meets the capacity of 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
, by the noisy-channel coding theorem.
</p><p>The rate 
  
    
      
        R
        (
        
          C
          
            ∗
          
        
        )
        =
        R
        (
        
          C
          
            in
          
        
        )
        ×
        R
        (
        
          C
          
            out
          
        
        )
        =
        (
        1
        −
        
          
            ϵ
            2
          
        
        )
        (
        1
        −
        H
        (
        p
        )
        −
        
          
            ϵ
            2
          
        
        )
        ≥
        1
        −
        H
        (
        p
        )
        −
        ϵ
      
    
    {\displaystyle R(C^{*})=R(C_{\text{in}})\times R(C_{\text{out}})=(1-{\frac {\epsilon }{2}})(1-H(p)-{\frac {\epsilon }{2}})\geq 1-H(p)-\epsilon }
  
 which almost meets the 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 capacity. We further note that the encoding and decoding of  
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
 can be done in polynomial time with respect to 
  
    
      
        N
      
    
    {\displaystyle N}
  
. As a matter of fact,  encoding 
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
 takes time 
  
    
      
        O
        (
        
          N
          
            2
          
        
        )
        +
        O
        (
        N
        
          k
          
            2
          
        
        )
        =
        O
        (
        
          N
          
            2
          
        
        )
      
    
    {\displaystyle O(N^{2})+O(Nk^{2})=O(N^{2})}
  
. Further, the decoding algorithm described takes time 
  
    
      
        N
        
          t
          
            in
          
        
        (
        k
        )
        +
        
          t
          
            out
          
        
        (
        N
        )
        =
        
          N
          
            O
            (
            1
            )
          
        
      
    
    {\displaystyle Nt_{\text{in}}(k)+t_{\text{out}}(N)=N^{O(1)}}
  
 as long as 
  
    
      
        
          t
          
            out
          
        
        (
        N
        )
        =
        
          N
          
            O
            (
            1
            )
          
        
      
    
    {\displaystyle t_{\text{out}}(N)=N^{O(1)}}
  
; and 
  
    
      
        
          t
          
            in
          
        
        (
        k
        )
        =
        
          2
          
            O
            (
            k
            )
          
        
      
    
    {\displaystyle t_{\text{in}}(k)=2^{O(k)}}
  
.
</p>
<h4>Decoding error probability</h4>
<p>A natural decoding algorithm for 
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
 is to:
</p>
<ul><li>Assume 
  
    
      
        
          y
          
            i
          
          
            ′
          
        
        =
        
          D
          
            in
          
        
        (
        
          y
          
            i
          
        
        )
        ,
        
        i
        ∈
        (
        0
        ,
        N
        )
      
    
    {\displaystyle y_{i}^{\prime }=D_{\text{in}}(y_{i}),\quad i\in (0,N)}
  
</li>
<li>Execute 
  
    
      
        
          D
          
            out
          
        
      
    
    {\displaystyle D_{\text{out}}}
  
 on 
  
    
      
        
          y
          
            ′
          
        
        =
        (
        
          y
          
            1
          
          
            ′
          
        
        …
        
          y
          
            N
          
          
            ′
          
        
        )
      
    
    {\displaystyle y^{\prime }=(y_{1}^{\prime }\ldots y_{N}^{\prime })}
  
</li></ul>
<p>Note that each block of code for 
  
    
      
        
          C
          
            in
          
        
      
    
    {\displaystyle C_{\text{in}}}
  
 is considered a symbol for 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
. Now since the probability of error at any index 
  
    
      
        i
      
    
    {\displaystyle i}
  
 for 
  
    
      
        
          D
          
            in
          
        
      
    
    {\displaystyle D_{\text{in}}}
  
 is at most 
  
    
      
        
          
            
              γ
              2
            
          
        
      
    
    {\displaystyle {\tfrac {\gamma }{2}}}
  
 and the errors in 
  
    
      
        
          
            BSC
          
          
            p
          
        
      
    
    {\displaystyle {\text{BSC}}_{p}}
  
 are independent, the expected number of errors for 
  
    
      
        
          D
          
            in
          
        
      
    
    {\displaystyle D_{\text{in}}}
  
 is at most 
  
    
      
        
          
            
              
                γ
                N
              
              2
            
          
        
      
    
    {\displaystyle {\tfrac {\gamma N}{2}}}
  
 by linearity of expectation. Now applying <a href="/facts/Chernoff_bound/RLC6VlRU">Chernoff bound</a>, we have bound error probability of more than 
  
    
      
        γ
        N
      
    
    {\displaystyle \gamma N}
  
 errors occurring to be 
  
    
      
        
          e
          
            
              
                −
                γ
                N
              
              6
            
          
        
      
    
    {\displaystyle e^{\frac {-\gamma N}{6}}}
  
. Since the outer code 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
 can correct at most 
  
    
      
        γ
        N
      
    
    {\displaystyle \gamma N}
  
 errors, this is the <a href="/facts/Code/CSntvnEo">decoding</a> error probability of 
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
. This when expressed in asymptotic terms, gives us an error probability of 
  
    
      
        
          2
          
            −
            Ω
            (
            γ
            N
            )
          
        
      
    
    {\displaystyle 2^{-\Omega (\gamma N)}}
  
. Thus the achieved decoding error probability of 
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
  is exponentially small as the noisy-channel coding theorem.
</p><p>We have given a general technique to construct 
  
    
      
        
          C
          
            ∗
          
        
      
    
    {\displaystyle C^{*}}
  
. For more detailed descriptions on 
  
    
      
        
          C
          
            in
          
        
      
    
    {\displaystyle C_{\text{in}}}
  
 and 
  
    
      
        
          C
          
            out
          
        
      
    
    {\displaystyle C_{\text{out}}}
  
 please read the following references. Recently a few other codes have also been constructed for achieving the capacities. <a href="/facts/LDPC/yPYHEKAP">LDPC</a> codes have been considered for this purpose for their faster decoding time.<a class="footnote-ref" id="fnref:4" href="#fn:4"><sup>4</sup></a>
</p>
<h2 id="applications">Applications</h2>
<p>The binary symmetric channel can model a <a href="/facts/Disk_drive/hnsXRb5X">disk drive</a> used for memory storage: the channel input represents a bit being written to the disk and the output corresponds to the bit later being read. Error could arise from the magnetization flipping, background noise or the writing head making an error. Other objects which the binary symmetric channel can model include a telephone or radio communication line or <a href="/facts/Cell_division/n5MrunWj">cell division</a>, from which the daughter cells contain <a href="/facts/DNA/nB89Iauz">DNA</a> information from their parent cell.<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a>
</p><p>This channel is often used by theorists because it is one of the simplest <a href="/facts/Signal_noise/Izbv38Rr">noisy</a> channels to analyze. Many problems in <a href="/facts/Communication_theory/hsg79QAy">communication theory</a> can be <a href="/facts/Reduction_(complexity)/IVWF7a1c">reduced</a> to a BSC. Conversely, being able to transmit effectively over the BSC can give rise to solutions for more complicated channels.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Z-channel_(information_theory)/W8YVkBMB">Z channel</a></li></ul>
<h2 id="notes">Notes</h2>

<ul><li>Cover, Thomas M.; Thomas, Joy A. (1991). <i>Elements of Information Theory</i>. Hoboken, New Jersey: Wiley. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 978-0-471-24195-9.</li>
<li>G. David Forney. <a href="http://dspace.mit.edu/handle/1721.1/4303">Concatenated Codes</a>. MIT Press, Cambridge, MA, 1966.</li>
<li>Venkat Guruswamy's course on <a href="https://archive.today/20121215055356/http://www.cs.washington.edu/education/courses/533/06au/">[1]</a> Error-Correcting Codes: Constructions and Algorithms], Autumn 2006.</li>
<li><a href="/facts/David_J._C._MacKay/NK3D8JME">MacKay, David J.C.</a> (2003). <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html"><i>Information Theory, Inference, and Learning Algorithms</i></a>. Cambridge University Press. <a href="/facts/ISBN_(identifier)/15AdSPa9">ISBN</a> 0-521-64298-1.</li>
<li>Atri Rudra's course on Error Correcting Codes: Combinatorics, Algorithms, and Applications (Fall 2007), Lectures <a href="https://web.archive.org/web/20131108081414/http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect9.pdf">9</a>,  <a href="https://web.archive.org/web/20130911140759/http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect10.pdf">10</a>, <a href="https://web.archive.org/web/20131108082917/http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect29.pdf">29</a>, and <a href="https://web.archive.org/web/20131108082922/http://www.cse.buffalo.edu/~atri/courses/coding-theory/lectures/lect30.pdf">30</a>.</li>
<li>Madhu Sudan's course on Algorithmic Introduction to Coding Theory (Fall 2001), Lecture <a href="http://people.csail.mit.edu/madhu/FT01/scribe/lect1.ps">1</a> and <a href="http://people.csail.mit.edu/madhu/FT01/scribe/lect2.ps">2</a>.</li>
<li><a href="http://portal.acm.org/citation.cfm?id=584093">A mathematical theory of communication</a> C. E Shannon, ACM SIGMOBILE Mobile Computing and Communications Review.</li>
<li><a href="http://assets.cambridge.org/97805218/52296/copyright/9780521852296_copyright_info.pdf">Modern Coding Theory</a> by Tom Richardson and Rudiger Urbanke., Cambridge University Press</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>MacKay (2003), p. 4. - MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1. <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html" target="_blank">http://www.inference.phy.cam.ac.uk/mackay/itila/book.html</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>MacKay (2003), p. 15. - MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1. <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html" target="_blank">http://www.inference.phy.cam.ac.uk/mackay/itila/book.html</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>MacKay (2003), p. 15. - MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1. <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html" target="_blank">http://www.inference.phy.cam.ac.uk/mackay/itila/book.html</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>Richardson and Urbanke <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>MacKay (2003), p. 3–4. - MacKay, David J.C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1. <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html" target="_blank">http://www.inference.phy.cam.ac.uk/mackay/itila/book.html</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
</ol>

Binary symmetric channel open-in-new

Binary symmetric channel