Panhaolin2001 / Compiler-R1Links

Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning

☆12

Alternatives and similar repositories for Compiler-R1

Users that are interested in Compiler-R1 are comparing it to the libraries listed below

Sorting:

thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆23Updated 8 months ago
UNITES-Lab / C2R-MoE
[NAACL 25 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Pa…
☆10Updated 5 months ago
Anonymous1252022 / Megatron-DeepSpeed
☆12Updated 9 months ago
uservan / ThinkPO
☆18Updated 4 months ago
Jikai0Wang / OPT-Tree
☆23Updated last month
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆31Updated last year
sail-sg / LongSpec
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆57Updated 4 months ago
Infini-AI-Lab / Multiverse
☆71Updated last week
LCM-Lab / LCM_Stack
Code for paper: Long cOntext aliGnment via efficient preference Optimization
☆14Updated 5 months ago
LLMkvsys / rethink-kv-compression
☆13Updated 4 months ago
jiwonsong-dev / ReasoningPathCompression
Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆19Updated last month
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆46Updated 8 months ago
dvlab-research / Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
☆53Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆35Updated 9 months ago
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆90Updated 3 weeks ago
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆45Updated 2 months ago
JarvisPei / CMoE
Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
☆22Updated 4 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆42Updated last week
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆89Updated 7 months ago
alessiodevoto / l2compress
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆14Updated 7 months ago
Infini-AI-Lab / gsm_infinite
☆47Updated last month
ArminAzizi98 / LaMDA
☆15Updated 8 months ago
TianjinYellow / StableSPAM
☆22Updated 3 months ago
Infini-AI-Lab / APE
☆29Updated 5 months ago
metacarbon / shareAtt
Beyond KV Caching: Shared Attention for Efficient LLMs
☆19Updated 11 months ago
OpenSparseLLMs / Linearization
☆51Updated last week
DerrickYLJ / TidalDecode
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
☆40Updated 2 months ago
machilusZ / FastGen
This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
☆37Updated 11 months ago
DualityRL / multi-attempt
☆19Updated 4 months ago
sail-sg / VocabularyParallelism
Vocabulary Parallelism
☆19Updated 4 months ago