llama.cpp is an open source software library that performs inference on various large language models such as Llama. It is co-developed alongside the GGML project, a general-purpose tensor library.
Command-line tools are included with the library, alongside a server with a simple web interface.