adalkiran / llama-nuts-and-boltsLinks
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
☆313Updated last year
Alternatives and similar repositories for llama-nuts-and-bolts
Users that are interested in llama-nuts-and-bolts are comparing it to the libraries listed below
Sorting:
- A compact LLM pretrained in 9 days by using high quality data☆324Updated 5 months ago
- ☆567Updated last year
- Automatically evaluate your LLMs in Google Colab☆660Updated last year
- Comparison of Language Model Inference Engines☆229Updated 9 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆146Updated last year
- ☆421Updated last month
- Visualize the intermediate output of Mistral 7B☆369Updated 8 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆661Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆248Updated last year
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆491Updated last year
- Comparison of the output quality of quantization methods, using Llama 3, transformers, GGUF, EXL2.☆163Updated last year
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆225Updated 3 months ago
- function calling-based LLM agents☆287Updated last year
- ☆161Updated last month
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆162Updated last month
- A collection of all available inference solutions for the LLMs☆91Updated 6 months ago
- ☆558Updated 10 months ago
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆687Updated last year
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆223Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)☆878Updated 2 weeks ago
- a small code base for training large models☆311Updated 4 months ago
- ☆469Updated last year
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆344Updated 5 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 11 months ago
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆370Updated last year
- Best practices for distilling large language models.☆575Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆338Updated 4 months ago
- A bagel, with everything.☆325Updated last year
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆658Updated 5 months ago