facebookresearch / fastgenLinks
Simple high-throughput inference library
☆115Updated 3 weeks ago
Alternatives and similar repositories for fastgen
Users that are interested in fastgen are comparing it to the libraries listed below
Sorting:
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆69Updated 2 weeks ago
- RWKV-7: Surpassing GPT☆88Updated 6 months ago
- Load compute kernels from the Hub☆144Updated this week
- Inference of Mamba models in pure C☆187Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 6 months ago
- Samples of good AI generated CUDA kernels☆65Updated last week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆237Updated 4 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 7 months ago
- Make triton easier☆47Updated 11 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆100Updated 3 months ago
- ☆46Updated last week
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- 👷 Build compute kernels☆44Updated this week
- Experiments on speculative sampling with Llama models☆126Updated last year
- Testing LLM reasoning abilities with family relationship quizzes.☆61Updated 4 months ago
- ☆11Updated 4 months ago
- QuIP quantization☆52Updated last year
- Learn CUDA with PyTorch☆21Updated this week
- ☆49Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- ☆56Updated 2 months ago
- Token Omission Via Attention☆126Updated 7 months ago
- ☆37Updated last month
- ☆108Updated last year
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 4 months ago
- Collection of autoregressive model implementation☆85Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- Python bindings for ggml☆141Updated 9 months ago
- research impl of Native Sparse Attention (2502.11089)☆54Updated 3 months ago
- This repository has code for fine-tuning LLMs with GRPO specifically for Rust Programming using cargo as feedback☆94Updated 2 months ago