xuhaoxh / infini-gram-miniLinks
β26Updated last month
Alternatives and similar repositories for infini-gram-mini
Users that are interested in infini-gram-mini are comparing it to the libraries listed below
Sorting:
- DPO, but faster πβ46Updated 11 months ago
- GoldFinch and other hybrid transformer componentsβ45Updated last year
- A repository for research on medium sized language models.β78Updated last year
- Official Repository for Task-Circuit Quantizationβ24Updated 5 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)β35Updated 8 months ago
- Fork of Flame repo for training of some new stuff in developmentβ18Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignmentβ60Updated last year
- JAX Scalify: end-to-end scaled arithmeticsβ16Updated last year
- https://x.com/BlinkDL_AI/status/1884768989743882276β28Updated 6 months ago
- Official implementation of ECCV24 paper: POAβ24Updated last year
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"β51Updated 8 months ago
- β86Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Schedulingβ40Updated 3 weeks ago
- β39Updated 6 months ago
- π TPTT: Transforming Pretrained Transformers into Titansβ29Updated 3 weeks ago
- Lottery Ticket Adaptationβ40Updated 11 months ago
- β65Updated 7 months ago
- MEXMA: Token-level objectives improve sentence representationsβ42Updated 10 months ago
- This is a simple torch implementation of the high performance Multi-Query Attentionβ15Updated 2 years ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the bestβ¦β53Updated 7 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPOβ29Updated this week
- The official repo for βUnleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problemβ [EMNLP25]β33Updated 2 months ago
- β55Updated 4 months ago
- Resa: Transparent Reasoning Models via SAEsβ44Updated last month
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)β53Updated 3 weeks ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,β¦β51Updated last week
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'β54Updated last year
- Using FlexAttention to compute attention with different masking patternsβ47Updated last year
- Official Implementation of APB (ACL 2025 main Oral)β31Updated 8 months ago
- Explorations into adversarial losses on top of autoregressive loss for language modelingβ38Updated 8 months ago