PKULab1806 / Fairy-plus-minus-iLinks

Fairy±i (iFairy): Complex-valued Quantization Framework for Large Language Models

☆106

Alternatives and similar repositories for Fairy-plus-minus-i

Users that are interested in Fairy-plus-minus-i are comparing it to the libraries listed below

Sorting:

hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆91Updated last week
moonquest-ai / SRDA
☆29Updated 5 months ago
PKU-SEC-Lab / HybriMoE
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
☆90Updated 5 months ago
Emericen / tiny-qwen
A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI
☆192Updated this week
mdy666 / mdy_triton
☆149Updated 4 months ago
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆108Updated 3 weeks ago
tsinghua-ideal / Twilight
[NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning
☆69Updated 2 weeks ago
sii-research / siiRL
siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems
☆266Updated this week
rlite-project / RLite
A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…
☆83Updated 3 months ago
ByteDance-Seed / cudaLLM
☆125Updated 3 months ago
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆68Updated 7 months ago
attention-survey / Efficient_Attention_Survey
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
☆233Updated 3 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆205Updated last month
hao-ai-lab / vllm-ltr
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
☆64Updated last year
ifromeast / AI_analysis
analyse problems of AI with Math and Code
☆27Updated 4 months ago
stepfun-ai / Step3
☆438Updated 3 months ago
PKUFlyingPig / MIT6.5940_TinyML
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
☆60Updated 10 months ago
feifeibear / DPSKV3MFU
Estimate MFU for DeepSeekV3
☆26Updated 10 months ago
xlite-dev / ffpa-attn
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
☆231Updated last week
linkedlist771 / UCAS-MOOC-AutoWatch
☆21Updated last year
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆173Updated 2 months ago
qingkelab / qingketalk
青稞Talk
☆168Updated this week
microsoft / AttentionEngine
☆111Updated 6 months ago
interestingLSY / swiftLLM
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …
☆289Updated 5 months ago
omni-ai-npu / omni-infer
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…
☆88Updated last week
svg-project / flash-kmeans
Fast and memory-efficient exact kmeans
☆126Updated 2 weeks ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆131Updated last month
CalvinXKY / mfu_calculation
A simple calculation for LLM MFU.
☆50Updated 2 months ago
OpenSparseLLMs / Linear-MoE
☆120Updated 5 months ago
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆155Updated last month