adalkiran / llama-nuts-and-boltsLinks
A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.
☆315Updated last year
Alternatives and similar repositories for llama-nuts-and-bolts
Users that are interested in llama-nuts-and-bolts are comparing it to the libraries listed below
Sorting:
- ☆580Updated last year
- ☆455Updated this week
- A compact LLM pretrained in 9 days by using high quality data☆334Updated 7 months ago
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆162Updated 3 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆257Updated last year
- ☆473Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- Tutorial for building LLM router☆236Updated last year
- Comparison of Language Model Inference Engines☆236Updated 11 months ago
- Fast parallel LLM inference for MLX☆233Updated last year
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆262Updated this week
- Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …☆724Updated this week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- Automatically evaluate your LLMs in Google Colab☆671Updated last year
- LLM Workshop by Sourab Mangrulkar☆397Updated last year
- scalable and robust tree-based speculative decoding algorithm☆363Updated 10 months ago
- A collection of all available inference solutions for the LLMs☆92Updated 8 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆668Updated 7 months ago
- An Open Source Toolkit For LLM Distillation☆785Updated 4 months ago
- An extension of the nanoGPT repository for training small MOE models.☆215Updated 8 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆347Updated 6 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆661Updated last year
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year
- ☆446Updated last year
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆686Updated last year
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆498Updated last year
- LLM Inference on consumer devices☆125Updated 8 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆275Updated last year
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆252Updated 6 months ago
- The official evaluation suite and dynamic data release for MixEval.☆253Updated last year