adalkiran / llama-nuts-and-boltsLinks

A holistic way of understanding how Llama and its components run in practice, with code and detailed documentation.

☆315

Alternatives and similar repositories for llama-nuts-and-bolts

Users that are interested in llama-nuts-and-bolts are comparing it to the libraries listed below

Sorting:

apoorvumang / prompt-lookup-decoding
☆580Updated last year
ScalingIntelligence / tokasaurus
☆455Updated this week
Pints-AI / 1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
☆334Updated 7 months ago
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆162Updated 3 months ago
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆257Updated last year
ray-project / llmperf-leaderboard
☆473Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
anyscale / llm-router
Tutorial for building LLM router
☆236Updated last year
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆236Updated 11 months ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆233Updated last year
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆262Updated this week
intel / auto-round
Advanced quantization toolkit for LLMs and VLMs. Native support for WOQ, MXFP4, NVFP4, GGUF, Adaptive Bits and seamless integration with …
☆724Updated this week
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆671Updated last year
pacman100 / LLM-Workshop
LLM Workshop by Sourab Mangrulkar
☆397Updated last year
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆363Updated 10 months ago
mani-kantap / llm-inference-solutions
A collection of all available inference solutions for the LLMs
☆92Updated 8 months ago
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆668Updated 7 months ago
arcee-ai / DistillKit
An Open Source Toolkit For LLM Distillation
☆785Updated 4 months ago
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆215Updated 8 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 6 months ago
datamllab / LongLM
[ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
☆661Updated last year
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
mistralai-sf24 / hackathon
☆446Updated last year
MDK8888 / GPTFast
Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.
☆686Updated last year
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆498Updated last year
Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆125Updated 8 months ago
lamini-ai / Lamini-Memory-Tuning
Banishing LLM Hallucinations Requires Rethinking Generalization
☆275Updated last year
antimatter15 / reverse-engineering-gemma-3n
Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model
☆252Updated 6 months ago
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆253Updated last year