1.58-bit large language model

<p>A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a <a href="/facts/Transformer_(deep_learning_architecture)/cDbjx6a8">transformer</a> <a href="/facts/Large_language_model/WnogWVJY">large language model</a> with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and <a href="/facts/Perplexity_(LLM)/VfmbcSMg">perplexity</a> of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit <a href="/facts/FP16/7xPj7B3w">FP16</a> or <a href="/facts/BF16/wo1Q6IJY">BF16</a>) counterparts, this design allows reaching the same <a href="/facts/Artificial_intelligence/lJGauQwX">artificial intelligence</a> goals with much lower hardware requirements, latency, and training effort.
</p><p>The name comes from a fact that a single <a href="/facts/Ternary_numeral_system/bMUfR74P">trit</a>, a <a href="/facts/Ternary_arithmetic/bMUfR74P">ternary arithmetic</a> equivalent of a bit that can take the {-1, 0, 1} values, carries 
  
    
      
        l
        o
        
          g
          
            2
          
        
        3
        ≈
        1.58
      
    
    {\displaystyle log_{2}3\approx 1.58}
  
 <a href="/facts/Bits_of_information/lxjSrVJM">bits of information</a>. The 1.58-bit LLM models are also called 1-bit LLMs (the true 1-bit models also exist).
</p>

1.58-bit large language model open-in-new

1.58-bit large language model