google-deepmind / language_modeling_is_compressionLinks

☆137

Alternatives and similar repositories for language_modeling_is_compression

Users that are interested in language_modeling_is_compression are comparing it to the libraries listed below

Sorting:

hkust-nlp / llm-compression-intelligence
Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]
☆136Updated 8 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆234Updated 3 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆214Updated last year
ScalingIntelligence / large_language_monkeys
☆93Updated 8 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆156Updated last month
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆162Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆161Updated 11 months ago
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆195Updated 10 months ago
epfml / dynamic-sparse-flash-attention
☆144Updated 2 years ago
RobertCsordas / moeut
☆79Updated 9 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆78Updated 5 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆74Updated 3 months ago
ryoungj / ObsScaling
[NeurIPS'24 Spotlight] Observational Scaling Laws
☆55Updated 8 months ago
vcskaushik / LLMzip
☆50Updated 4 months ago
HKUNLP / DiffuLLaMA
[ICLR2025] DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆192Updated this week
FasterDecoding / BitDelta
☆197Updated 6 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆149Updated last month
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆146Updated 3 weeks ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆59Updated 7 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆237Updated 4 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆121Updated 4 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆81Updated 3 weeks ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆126Updated 7 months ago
vwxyzjn / summarize_from_feedback_details
☆141Updated 6 months ago
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆134Updated 9 months ago
kyleliang919 / Online-Subspace-Descent
This repo is based on https://github.com/jiaweizzhao/GaLore
☆28Updated 8 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆84Updated 8 months ago
SynthLabsAI / big-math
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
☆55Updated 3 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆106Updated 2 months ago