facebookresearch / blt
Code for BLT research paper
β1,352Updated this week
Alternatives and similar repositories for blt:
Users that are interested in blt are comparing it to the libraries listed below
- Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"β833Updated last week
- A Self-adaptation Frameworkπ that adapts LLMs for unseen tasks in real-time!β801Updated last week
- Recipes to scale inference-time compute of open modelsβ971Updated last week
- Large Concept Models: Language modeling in a sentence representation spaceβ1,794Updated this week
- Unofficial implementation of Titans, SOTA memory for transformers, in Pytorchβ891Updated this week
- Training Large Language Model to Reason in a Continuous Latent Spaceβ735Updated this week
- [ICLR2025] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parametersβ482Updated this week
- Bringing BERT into modernity via both architecture changes and scalingβ1,108Updated last week
- Minimalistic large language model 3D-parallelism trainingβ1,400Updated this week
- nanoGPT style version of Llama 3.1β1,300Updated 5 months ago
- A bibliography and survey of the papers surrounding o1β1,076Updated 2 months ago
- veRL: Volcano Engine Reinforcement Learning for LLMβ1,135Updated this week
- π Efficient implementations of state-of-the-art linear attention models in Pytorch and Tritonβ1,811Updated this week
- β997Updated last month
- NanoGPT (124M) in 3 minutesβ2,152Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purposeβ670Updated this week
- An Open Large Reasoning Model for Real-World Solutionsβ1,410Updated 2 months ago
- SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.β647Updated last month
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.β1,228Updated 2 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.β2,011Updated this week
- β2,341Updated this week
- Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793β380Updated last month
- System 2 Reasoning Link Collectionβ751Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,022Updated this week
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β1,913Updated 5 months ago
- Schedule-Free Optimization in PyTorchβ2,069Updated last month
- Helpful tools and examples for working with flex-attentionβ603Updated this week
- β867Updated this week