wangqinsi1/CoreInfer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wangqinsi1/CoreInfer)

wangqinsi1 / CoreInfer

This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation.

☆18

Alternatives and similar repositories for CoreInfer

Users that are interested in CoreInfer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wangqinsi1 / 2025-ICML-CoreMatching
View on GitHub
[ICML 2025] CoreMatching: Co-adaptive Sparse Inference Framework for Comprehensive Acceleration of Vision Language Model
☆16May 27, 2025Updated last year
Yuzhe-Fu / FlashFPS
View on GitHub
[DAC 2026] FlashFPS
☆15Jun 1, 2026Updated last month
HankYe / KVCOMM
View on GitHub
[NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
☆17Nov 1, 2025Updated 8 months ago
wangqinsi1 / Dobi-SVD
View on GitHub
[ICLR 2025] Dobi-SVD : Differentiable SVD for LLM Compression and Some New Perspectives"
☆54Oct 19, 2025Updated 9 months ago
T2S-Bench / T2S-Bench
View on GitHub
This is Official implementation for T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasonin…
☆24Mar 5, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Yuzhe-Fu / FractalCloud
View on GitHub
[HPCA 2026] FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing
☆22Apr 21, 2026Updated 3 months ago
Ting-Justin-Jiang / sada-icml
View on GitHub
[ICML 2025] Official Repo for Stability-guided Adaptive Diffusion Acceleration. 🚀🌙Accelerating off-the-shelf diffusion model with a uni…
☆43Jul 24, 2025Updated 11 months ago
Ting-Justin-Jiang / ZEUS
View on GitHub
[ACM MM 2026]⚡ZEUS accelerates your diffuser. Any modality. Any model. Any scheduler. https://yixiao-wang-stats.github.io/zeus/
☆20Jun 2, 2026Updated last month
wangqinsi1 / GAINRL
View on GitHub
[NeurIPS Spotlight 2025] Angles Don’t Lie: Unlocking Training-Efficient RL Through the Model’s Own Signals.
☆83Sep 26, 2025Updated 9 months ago
zjysteven / mink-plus-plus
View on GitHub
[ICLR'25 Spotlight] Min-K%++: Improved baseline for detecting pre-training data of LLMs
☆58May 26, 2025Updated last year
ROIM1998 / APT
View on GitHub
[ICML'24 Oral] APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
☆48Jun 4, 2024Updated 2 years ago
wangqinsi1 / Vision-Zero
View on GitHub
[ICLR 2026] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
☆136Feb 6, 2026Updated 5 months ago
Shwai-He / MEO
View on GitHub
The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":
☆47Feb 28, 2026Updated 4 months ago
Komorebi660 / RV32I-CPU
View on GitHub
An implementation of 5-stages RISC-V CPU
☆12Jul 22, 2022Updated 4 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
mbalesni / deepspeed_llama
View on GitHub
Finetuning LLaMA with DeepSpeed
☆10Apr 14, 2023Updated 3 years ago
ustc-hyin / HiMAP
View on GitHub
Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference
☆14Jun 7, 2025Updated last year
HarryWu99 / llm_kvcache_sparsity
View on GitHub
Implement some method of LLM KV Cache Sparsity
☆41Jun 6, 2024Updated 2 years ago
ZSYTY / STL-allocator
View on GitHub
Final project of Object-Oriented-Programming: STL allocator + memory pool
☆10Jun 22, 2019Updated 7 years ago
zyxxmu / Bi-Mask
View on GitHub
Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"
☆13Jun 7, 2023Updated 3 years ago
gccnlp / Light-PEFT
View on GitHub
[ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning
☆13Sep 2, 2024Updated last year
Crypt0knights / OpenTrack
View on GitHub
An Efficient Supply Chain Management System using Blockchain & Machine Learning.
☆10Nov 27, 2019Updated 6 years ago
ryqdev / CG_2019_ZJU
View on GitHub
This is our Computer Graphics course project in ZJU
☆13Apr 14, 2020Updated 6 years ago
ycao5602 / KAFAL
View on GitHub
Code for the paper "Knowledge-Aware Federated Active Learning with Non-IID Data", ICCV2023
☆10Sep 8, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
0x4f5da2 / Wait4GPU
View on GitHub
A simple utility to execute your deep learning scripts when there are enough idle gpus | 一个在有足够的空闲gpu时执行深度学习训练的小工具
☆16Mar 22, 2022Updated 4 years ago
TerryPei / CSP
View on GitHub
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference
☆10Dec 15, 2024Updated last year
HankYe / Once-for-Both
View on GitHub
[CVPR'24] Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
☆15Jul 1, 2024Updated 2 years ago
InternScience / AdaptiveDiffusion
View on GitHub
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
☆73Jan 22, 2025Updated last year
penghao-wu / ProxyV
View on GitHub
[ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMM
☆20May 22, 2025Updated last year
eth-sri / lamp
View on GitHub
LAMP: Extracting Text from Gradients with Language Model Priors (NeurIPS '22)
☆29May 26, 2025Updated last year
SuDIS-ZJU / rookies
View on GitHub
Rookie's guide
☆13Aug 10, 2024Updated last year
thunlp / BlockFFN
View on GitHub
Source codes for paper "BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity".
☆19Jan 10, 2026Updated 6 months ago
anonymous-sushi-armadillo / fast_is_better_than_free_CIFAR10
View on GitHub
☆13Jul 25, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
OpenGVLab / LLMPrune-BESA
View on GitHub
BESA is a differentiable weight pruning technique for large language models.
☆17Mar 4, 2024Updated 2 years ago
zjysteven / Awesome-Byte-LLM
View on GitHub
A curated list of papers and resources on byte-based large language models (LLMs) — models that operate directly on raw bytes.
☆15Jul 12, 2025Updated last year
pierrefdz / stable_signature
View on GitHub
Please go to https://github.com/facebookresearch/stable_signature
☆14Jul 26, 2023Updated 2 years ago
Daftstone / TrialAttack
View on GitHub
Tensorflow implementation of TrialAttack (Triple Adversarial Learning for Influence based Poisoning Attack in Recommender Systems. KDD 20…
☆12Sep 2, 2021Updated 4 years ago
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆173Mar 8, 2026Updated 4 months ago
yuny220 / NAR-Former
View on GitHub
Pytorch code of [CVPR 2023] "NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction".
☆11Mar 14, 2023Updated 3 years ago
DerrickYLJ / TidalDecode
View on GitHub
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆57Aug 6, 2025Updated 11 months ago