BlinkDL / fast.cLinks
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆74Updated last year
Alternatives and similar repositories for fast.c
Users that are interested in fast.c are comparing it to the libraries listed below
Sorting:
- RWKV-7: Surpassing GPT☆104Updated last year
- Inference RWKV v7 in pure C.☆44Updated 4 months ago
- RWKV in nanoGPT style☆197Updated last year
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆131Updated last year
- Inference of Mamba and Mamba2 models in pure C☆196Updated 3 weeks ago
- QuIP quantization☆62Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆279Updated 2 years ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 9 months ago
- ☆147Updated last year
- Simple high-throughput inference library☆155Updated 8 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆112Updated 8 months ago
- tinygrad port of the RWKV large language model.☆45Updated 11 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆251Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆53Updated 11 months ago
- Token Omission Via Attention☆128Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated last year
- ☆67Updated 10 months ago
- A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…☆47Updated 3 months ago
- ☆163Updated 7 months ago
- ☆71Updated 7 months ago
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year
- ☆63Updated last year
- Fast modular code to create and train cutting edge LLMs☆68Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆28Updated 2 years ago
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago
- Experiments with BitNet inference on CPU☆55Updated last year
- ☆64Updated 8 months ago