NathanGodey/qfilters

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NathanGodey/qfilters)

NathanGodey / qfilters

Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)

☆34

Alternatives and similar repositories for qfilters

Users that are interested in qfilters are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆19Dec 13, 2024Updated last year
enjalot / latent-data-modal
View on GitHub
Using modal.com to process FineWeb-edu data
☆20Apr 11, 2026Updated 3 months ago
ASISys / AdaSkip
View on GitHub
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
☆21Jan 24, 2025Updated last year
TemporaryLoRA / FreeLM
View on GitHub
☆15Feb 10, 2026Updated 5 months ago
kubernetes-bad / reward-composer
View on GitHub
Lego for GRPO
☆30May 27, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
SakanaAI / CycleQD
View on GitHub
CycleQD is a framework for parameter space model merging.
☆48Feb 1, 2025Updated last year
BrotherHappy / OSTQuant
View on GitHub
[ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…
☆94Apr 8, 2025Updated last year
ByteDance-Seed / FlexPrefill
View on GitHub
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆170Oct 13, 2025Updated 9 months ago
IST-DASLab / RoSA
View on GitHub
Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)
☆46May 20, 2026Updated 2 months ago
kiaia / GIRAFFE
View on GitHub
Extending context length of visual language models
☆12Dec 18, 2024Updated last year
zhuzilin / vllm-group
View on GitHub
☆12Nov 5, 2024Updated last year
ilur98 / DGQ
View on GitHub
Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Dec 27, 2023Updated 2 years ago
RUCAIBox / QuantizedEmpirical
View on GitHub
☆15Sep 24, 2023Updated 2 years ago
dayal-kalra / low-memory-adam
View on GitHub
☆14Mar 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
dmis-lab / Monet
View on GitHub
[ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers
☆79Jun 23, 2025Updated last year
kvfrans / matrix-whitening
View on GitHub
Code for "What really matters in matrix-whitening optimizers?"
☆25Oct 31, 2025Updated 8 months ago
AI-ANK / Airbnb-Listing-Explorer
View on GitHub
☆29Apr 29, 2024Updated 2 years ago
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
FasterDecoding / SnapKV
View on GitHub
☆327Jul 10, 2025Updated last year
kaloureyes3 / v4-clients
View on GitHub
☆10Apr 5, 2024Updated 2 years ago
TRI-ML / linear_open_lm
View on GitHub
A repository for research on medium sized language models.
☆78May 23, 2024Updated 2 years ago
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
FFTYYY / RaanA
View on GitHub
Implementation of "RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm"
☆17Apr 11, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
iantbutler01 / ditty
View on GitHub
A library for simplifying training with multi gpu setups in the HuggingFace / PyTorch ecosystem.
☆16Jun 10, 2026Updated last month
mit-han-lab / x-attention
View on GitHub
[ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring
☆280Jul 6, 2025Updated last year
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
csjfwang / Forecast-PEFT
View on GitHub
Forecast-PEFT: Parameter-Efficient Fine-Tuning for Pre-trained Motion Forecasting Models
☆14Oct 6, 2024Updated last year
yuyq96 / TextHawk
View on GitHub
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆68Nov 1, 2024Updated last year
dada-qin / Data-Centric_LLM_Studies
View on GitHub
A list of papers about data quality in Large Language Models (LLMs)
☆27Dec 14, 2023Updated 2 years ago
FYQ0919 / PTSA-MCTS
View on GitHub
A PyTorch implementation of PTSA-MCTS from [Accelerating Monte Carlo Tree Search with Probability Tree State Abstraction].
☆16Oct 21, 2023Updated 2 years ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆54Oct 18, 2024Updated last year
NathanGodey / headless-lm
View on GitHub
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆29Apr 17, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models (ICLR2026)
☆23Jun 24, 2026Updated last month
callummcdougall / sae_visualizer
View on GitHub
☆31Apr 4, 2024Updated 2 years ago
corl-team / lime
View on GitHub
Official implementation of the paper "You Do Not Fully Utilize Transformer's Representation Capacity"
☆32May 28, 2025Updated last year
INT-FlashAttention2024 / INT-FlashAttention
View on GitHub
☆91Jan 23, 2025Updated last year
opengear-project / GEAR
View on GitHub
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
☆184Jul 12, 2024Updated 2 years ago
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated 2 years ago
AIoT-MLSys-Lab / D2O
View on GitHub
[ICLR 2025🔥] D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
☆27Jul 7, 2025Updated last year