anthonix / llm.c
LLM training in simple, raw C/HIP for AMD GPUs
☆37Updated last month
Related projects ⓘ
Alternatives and complementary repositories for llm.c
- 1.58 Bit LLM on Apple Silicon using MLX☆134Updated 5 months ago
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆110Updated 6 months ago
- ☆84Updated last month
- 1.58-bit LLaMa model☆79Updated 7 months ago
- ☆96Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆86Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆261Updated last year
- scalable and robust tree-based speculative decoding algorithm☆313Updated 2 months ago
- WebGPU LLM inference tuned by hand☆146Updated last year
- port of Andrjey Karpathy's llm.c to Mojo☆321Updated 3 weeks ago
- Fast parallel LLM inference for MLX☆146Updated 4 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆171Updated 3 weeks ago
- ☆60Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- An implementation of bucketMul LLM inference☆214Updated 4 months ago
- Gpu benchmark☆43Updated last month
- Inference of Mamba models in pure C☆177Updated 8 months ago
- ☆116Updated 2 months ago
- look how they massacred my boy☆53Updated 3 weeks ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 weeks ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated 2 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆99Updated last week
- This is our own implementation of 'Layer Selective Rank Reduction'☆231Updated 5 months ago
- run paligemma in real time☆122Updated 5 months ago
- AMD related optimizations for transformer models☆57Updated this week
- ☆64Updated 5 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆259Updated last week
- GGUF implementation in C as a library and a tools CLI program☆242Updated 4 months ago