thepowerfuldeez / OLMo
My fork os allen AI's OLMo for educational purposes.
☆30Updated 4 months ago
Alternatives and similar repositories for OLMo:
Users that are interested in OLMo are comparing it to the libraries listed below
- ☆74Updated 7 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 6 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆142Updated 6 months ago
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆43Updated 8 months ago
- This is the official repository for Inheritune.☆111Updated last month
- Train, tune, and infer Bamba model☆88Updated 2 months ago
- ☆47Updated 7 months ago
- ☆48Updated 4 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆81Updated 3 weeks ago
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆28Updated 3 weeks ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated 9 months ago
- ☆60Updated 11 months ago
- QuIP quantization☆52Updated last year
- Collection of autoregressive model implementation☆83Updated last month
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated 11 months ago
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆86Updated 3 weeks ago
- Work in progress.☆53Updated 2 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆35Updated 11 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆155Updated 9 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- ☆76Updated 2 months ago
- working implimention of deepseek MLA☆39Updated 2 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆131Updated last month
- ☆49Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆82Updated last year
- ☆49Updated 2 weeks ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆119Updated this week
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆59Updated 2 months ago