A 1.58-bit Large Language Model (1.58-bit LLM, also ternary LLM) is a version of a transformer large language model with weights using only three values: -1, 0, and +1. This restriction theoretically allows the model to replace costly multiplications with additions and reduce the storage memory. Since the end-task performance and perplexity of the 1.58-bit LLMs, at least for smaller model sizes (up to 3-4B parameters), are close to their "full precision" (16-bit FP16 or BF16) counterparts, this design allows reaching the same artificial intelligence goals with much lower hardware requirements, latency, and training effort.
The name comes from a fact that a single trit, a ternary arithmetic equivalent of a bit that can take the {-1, 0, 1} values, carries l o g 2 3 ≈ 1.58 {\displaystyle log_{2}3\approx 1.58} bits of information. The 1.58-bit LLM models are also called 1-bit LLMs (the true 1-bit models also exist).