adalkiran / llama-nuts-and-boltsLinks
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
☆316Updated last year
Alternatives and similar repositories for llama-nuts-and-bolts
Users that are interested in llama-nuts-and-bolts are comparing it to the libraries listed below
Sorting:
- A compact LLM pretrained in 9 days by using high quality data☆337Updated 8 months ago
- Comparison of Language Model Inference Engines☆238Updated last year
- Automatically evaluate your LLMs in Google Colab☆677Updated last year
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆163Updated 4 months ago
- ☆583Updated last year
- A collection of all available inference solutions for the LLMs☆93Updated 9 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated 3 weeks ago
- Local Qwen3 LLM inference. One easy-to-understand file of C source with no dependencies.☆150Updated 5 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆258Updated last year
- Tutorial for building LLM router☆239Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆902Updated last week
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆500Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆670Updated 8 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- ☆461Updated last month
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆273Updated this week
- Best practices for distilling large language models.☆595Updated last year
- ☆474Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆146Updated 2 years ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆348Updated 8 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆275Updated last year
- LLM Inference on consumer devices☆128Updated 9 months ago
- ☆446Updated last year
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆255Updated 6 months ago
- Visualize the intermediate output of Mistral 7B☆381Updated 11 months ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆232Updated 6 months ago
- scalable and robust tree-based speculative decoding algorithm☆366Updated 10 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- Official inference library for pre-processing of Mistral models☆832Updated this week
- Advanced quantization toolkit for LLMs and VLMs. Support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Schemes and seamless integration with Tra…☆775Updated this week