ridgerchu / matmulfreellm
Implementation for MatMul-free LM.
☆2,948Updated 2 months ago
Alternatives and similar repositories for matmulfreellm:
Users that are interested in matmulfreellm are comparing it to the libraries listed below
- Efficient Triton Kernels for LLM Training☆4,248Updated this week
- Tile primitives for speedy kernels☆1,966Updated this week
- NanoGPT (124M) in 3 minutes☆2,152Updated this week
- A PyTorch native library for large model training☆3,200Updated this week
- PyTorch native post-training library☆4,765Updated this week
- nanoGPT style version of Llama 3.1☆1,300Updated 5 months ago
- Code for BLT research paper☆1,352Updated this week
- PyTorch native quantization and sparsity for training and inference☆1,783Updated this week
- Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch☆1,747Updated last week
- Tools for merging pretrained large language models.☆5,157Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆7,967Updated this week
- llama3.np is a pure NumPy implementation for Llama 3 model.☆973Updated 7 months ago
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆833Updated last week
- Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors a…☆1,268Updated this week
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild☆1,912Updated this week
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,767Updated last month
- ☆4,054Updated 7 months ago
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,228Updated 2 months ago
- NVIDIA Linux open GPU with P2P support☆993Updated last month
- 4M: Massively Multimodal Masked Modeling☆1,671Updated 3 months ago
- ☆2,811Updated 4 months ago
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,479Updated this week
- On-device AI across mobile, embedded and edge for PyTorch☆2,439Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,167Updated this week
- UNet diffusion model in pure CUDA☆596Updated 7 months ago
- A nanoGPT pipeline packed in a spreadsheet☆2,061Updated 7 months ago
- ReFT: Representation Finetuning for Language Models☆1,388Updated 3 weeks ago
- Entropy Based Sampling and Parallel CoT Decoding☆3,208Updated 2 months ago
- Minimalistic large language model 3D-parallelism training☆1,400Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,321Updated this week