llm-fireq/fireq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-fireq/fireq)

llm-fireq / fireq

FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration

☆20

Alternatives and similar repositories for fireq

Users that are interested in fireq are comparing it to the libraries listed below

Sorting:

snu-mllab / Efficient-CNN-Depth-Compression
View on GitHub
Official PyTorch implementation of "Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming" (ICML'23)
☆13Jul 11, 2024Updated last year
ucb-bar / autocomp
View on GitHub
Autocomp: AI-Driven Code Optimizer for Tensor Accelerators
☆74Feb 24, 2026Updated last week
Egg-Hu / SMI
View on GitHub
[ICML 2024] Sparse Model Inversion: Efficient Inversion of Vision Transformers with Less Hallucination
☆13Apr 29, 2025Updated 10 months ago
ExaDGLM / ExaDGLM
View on GitHub
☆27Jan 3, 2025Updated last year
Cheliosoops / BitQ
View on GitHub
☆10Apr 24, 2024Updated last year
Qualcomm-AI-research / lr-qat
View on GitHub
☆52Nov 5, 2024Updated last year
ShiheWang / FIMA-Q
View on GitHub
[CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
☆26Jun 16, 2025Updated 8 months ago
hellozhuo / msgc
View on GitHub
Source code of our TNNLS paper "Boosting Convolutional Neural Networks with Middle Spectrum Grouped Convolution"
☆12Apr 14, 2023Updated 2 years ago
jiwonsong-dev / SLEB
View on GitHub
[ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
☆39Feb 4, 2025Updated last year
zjq0455 / PTQ1.61
View on GitHub
☆15Jan 12, 2026Updated last month
h-jia / TTE
View on GitHub
☆13Jul 14, 2025Updated 7 months ago
cat538 / MxMoE
View on GitHub
[ICML 2025] MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
☆22Jul 4, 2025Updated 8 months ago
naver-ai / negmerge
View on GitHub
[ICML 2025] Official PyTorch implementation of "NegMerge: Sign-Consensual Weight Merging for Machine Unlearning"
☆14Nov 25, 2025Updated 3 months ago
worker24h / jt808-lua-wireshark
View on GitHub
交通部808协议 wireshark解析 jt808 protocol wireshark
☆10Feb 20, 2021Updated 5 years ago
xjjxmu / QSLAW
View on GitHub
The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…
☆14Dec 7, 2024Updated last year
wilyub / VeriThoughts
View on GitHub
The first large scale formally verified reasoning dataset for Verilog
☆20May 16, 2025Updated 9 months ago
skypilot-sds / airflow-provider-skypilot
View on GitHub
☆27Nov 5, 2024Updated last year
WalkerWorldPeace / DOGE
View on GitHub
Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".
☆21May 23, 2025Updated 9 months ago
DensoITLab / bitprune
View on GitHub
☆11Apr 5, 2023Updated 2 years ago
hzf1174 / RoBoT
View on GitHub
Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search
☆10Mar 12, 2024Updated last year
shawnricecake / quart-depth
View on GitHub
[CVPR 2025] QuartDepth
☆17Mar 24, 2025Updated 11 months ago
libozhu03 / QArtSR
View on GitHub
☆17Mar 10, 2025Updated 11 months ago
Zhengsh123 / FREE-Merging
View on GitHub
The implementation for FREE-Merging: Fourier Transform for Model Merging with Lightweight Experts (ICCV25)
☆14Jun 26, 2025Updated 8 months ago
harvard-edge / cs249r_fall2025
View on GitHub
☆20Dec 16, 2025Updated 2 months ago
ACADLab / SA-DS
View on GitHub
☆13Jul 25, 2024Updated last year
nathan-barry / RoBERTaDiffusion
View on GitHub
A research project exploring fine-tuning BERT-style models for text generation
☆36Nov 30, 2025Updated 3 months ago
abdelfattah-lab / shadow_llm
View on GitHub
☆11Sep 20, 2024Updated last year
shawnricecake / squant
View on GitHub
[ICCAD 2025] Squant
☆15Jul 3, 2025Updated 8 months ago
MergeVLA / MergeVLA
View on GitHub
☆20Jan 30, 2026Updated last month
vishaln15 / OptimizedArrhythmiaDetection
View on GitHub
Code for Optimized Arrhythmia Detection on Ultra-Edge Devices
☆11May 26, 2022Updated 3 years ago
C-Fun / Self-Attentive-Pooling-for-Efficient-Deep-Learning
View on GitHub
Official PyTorch implementation of the paper entitled 'Self Attentive Pooling for Efficient Deep Learning'.
☆13May 3, 2024Updated last year
HaoKang-Timmy / LatencySensitiveBench
View on GitHub
First Latency-Aware Competitive LLM Agent Benchmark
☆26Jun 3, 2025Updated 9 months ago
kyrie-23 / linear_task_arithmetic
View on GitHub
☆12Jul 30, 2025Updated 7 months ago
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆17Oct 26, 2025Updated 4 months ago
MartinoTommasini / foxdissector
View on GitHub
A Wireshark dissector for the Niagara FOX protocol written in LUA
☆15Jul 2, 2021Updated 4 years ago
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated last year
uw-mad-dash / decoding-speculative-decoding
View on GitHub
☆14Aug 19, 2024Updated last year
FPSG-UIUC / micro24-fusemax-artifact
View on GitHub
MICRO 2024 Evaluation Artifact for FuseMax
☆16Aug 26, 2024Updated last year
SET-Scheduling-Project / SoMa-HPCA2025
View on GitHub
☆27Feb 27, 2025Updated last year