PKULab1806 / Fairy-plus-minus-iLinks
Fairy±i (iFairy): Complex-valued Quantization Framework for Large Language Models
☆106Updated last week
Alternatives and similar repositories for Fairy-plus-minus-i
Users that are interested in Fairy-plus-minus-i are comparing it to the libraries listed below
Sorting:
- Triton Documentation in Chinese Simplified / Triton 中文文档☆91Updated last week
- ☆29Updated 5 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆90Updated 5 months ago
- A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI☆192Updated this week
- ☆149Updated 4 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆108Updated 3 weeks ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆69Updated 2 weeks ago
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆266Updated this week
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆83Updated 3 months ago
- ☆125Updated 3 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆68Updated 7 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆233Updated 3 months ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆205Updated last month
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆64Updated last year
- analyse problems of AI with Math and Code☆27Updated 4 months ago
- ☆438Updated 3 months ago
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆60Updated 10 months ago
- Estimate MFU for DeepSeekV3☆26Updated 10 months ago
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆231Updated last week
- ☆21Updated last year
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆173Updated 2 months ago
- 青稞Talk☆168Updated this week
- ☆111Updated 6 months ago
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆289Updated 5 months ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆88Updated last week
- Fast and memory-efficient exact kmeans☆126Updated 2 weeks ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆131Updated last month
- A simple calculation for LLM MFU.☆50Updated 2 months ago
- ☆120Updated 5 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆155Updated last month