facebookresearch / blt
Code for BLT research paper
☆358Updated this week
Alternatives and similar repositories for blt:
Users that are interested in blt are comparing it to the libraries listed below
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"☆821Updated last week
- Large Concept Models: Language modeling in a sentence representation space☆108Updated this week
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆526Updated 5 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆181Updated 5 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 2 months ago
- An Open Source Toolkit For LLM Distillation☆372Updated 3 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆223Updated last month
- PyTorch implementation of models from the Zamba2 series.☆164Updated 3 weeks ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆174Updated this week
- System 2 Reasoning Link Collection☆702Updated this week
- Open weights language model from Google DeepMind, based on Griffin.☆612Updated 5 months ago
- A compact LLM pretrained in 9 days by using high quality data☆274Updated 2 weeks ago
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆603Updated 2 weeks ago
- Best practices & guides on how to write distributed pytorch training code☆315Updated 3 weeks ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆197Updated 2 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆715Updated 3 weeks ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆543Updated last week
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆737Updated last week
- Minimalistic large language model 3D-parallelism training☆1,331Updated this week
- Sparse autoencoders☆379Updated last week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆877Updated this week
- Fast bare-bones BPE for modern tokenizer training☆142Updated last month
- The repository for the code of the UltraFastBERT paper☆513Updated 8 months ago
- Annotated version of the Mamba paper☆460Updated 9 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆242Updated last week
- Manage scalable open LLM inference endpoints in Slurm clusters☆244Updated 5 months ago
- GRadient-INformed MoE☆261Updated 2 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆291Updated 2 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆251Updated 3 weeks ago