mlfoundations / open_lm
A repository for research on medium sized language models.
β491Updated last month
Alternatives and similar repositories for open_lm:
Users that are interested in open_lm are comparing it to the libraries listed below
- Scaling Data-Constrained Language Modelsβ333Updated 4 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ501Updated 3 months ago
- Large Context Attentionβ682Updated 3 weeks ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β701Updated 4 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"β296Updated last year
- Minimalistic large language model 3D-parallelism trainingβ1,483Updated this week
- Minimalistic 4D-parallelism distributed training framework for education purposeβ724Updated this week
- β496Updated 3 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruningβ584Updated 11 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ252Updated 7 months ago
- RewardBench: the first evaluation tool for reward models.β505Updated this week
- Implementation of paper Data Engineering for Scaling Language Models to 128K Contextβ451Updated 11 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).β803Updated last week
- Scalable toolkit for efficient model alignmentβ719Updated this week
- Multipack distributed sampler for fast padding-free training of LLMsβ184Updated 6 months ago
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.β397Updated 10 months ago
- distributed trainer for LLMsβ557Updated 9 months ago
- batched lorasβ338Updated last year
- β502Updated 5 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Dayβ255Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"β547Updated last month
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorchβ306Updated 8 months ago
- Official PyTorch implementation of QA-LoRAβ126Updated 11 months ago
- Inference code for Persimmon-8Bβ416Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β221Updated this week
- A bagel, with everything.β316Updated 10 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β297Updated 2 months ago
- Helpful tools and examples for working with flex-attentionβ635Updated this week
- [ICML 2024] CLLMs: Consistency Large Language Modelsβ372Updated 3 months ago
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attentionβ¦β286Updated 9 months ago