geronimi73 / accelerate_tricksLinks
☆14Updated 2 years ago
Alternatives and similar repositories for accelerate_tricks
Users that are interested in accelerate_tricks are comparing it to the libraries listed below
Sorting:
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆246Updated 4 months ago
- The HELMET Benchmark☆198Updated 2 months ago
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆229Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆484Updated last year
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆177Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆299Updated last year
- ☆273Updated 2 years ago
- ☆203Updated 9 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Updated 2 months ago
- Reproducible, flexible LLM evaluations☆337Updated 2 weeks ago
- The official evaluation suite and dynamic data release for MixEval.☆255Updated last year
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆77Updated 9 months ago
- BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.☆238Updated 5 months ago
- DSIR large-scale data selection framework for language model training☆269Updated last year
- Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.☆456Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆246Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆182Updated 7 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆111Updated 11 months ago
- Code for studying the super weight in LLM☆121Updated last year
- ☆232Updated 2 months ago
- ☆209Updated 2 years ago
- A simple unified framework for evaluating LLMs☆261Updated 9 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆204Updated last year
- Critique-out-Loud Reward Models☆73Updated last year
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆216Updated 2 months ago
- AnchorAttention: Improved attention for LLMs long-context training☆213Updated last year
- A brief and partial summary of RLHF algorithms.☆144Updated 11 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆224Updated last month
- ☆140Updated last year
- LLM-Merging: Building LLMs Efficiently through Merging☆209Updated last year