PKULab1806 / Fairy-plus-minus-iLinks
Fairy±i (iFairy): Complex-valued Quantization Framework for Large Language Models
☆104Updated last week
Alternatives and similar repositories for Fairy-plus-minus-i
Users that are interested in Fairy-plus-minus-i are comparing it to the libraries listed below
Sorting:
- ☆30Updated 4 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆72Updated 4 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆86Updated 6 months ago
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆62Updated 5 months ago
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆219Updated this week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆198Updated last week
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆275Updated 4 months ago
- A minimal, easy-to-read PyTorch reimplementation of the Qwen3 and Qwen2.5 VL with a fancy CLI☆174Updated last month
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆60Updated 9 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆103Updated last week
- ☆147Updated 3 months ago
- Efficient Mixture of Experts for LLM Paper List☆136Updated 2 weeks ago
- ☆428Updated 2 months ago
- ☆60Updated last year
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆245Updated 3 months ago
- ☆56Updated last year
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆156Updated 3 weeks ago
- Code release for AdapMoE accepted by ICCAD 2024☆34Updated 5 months ago
- A simple calculation for LLM MFU.☆48Updated last month
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm …☆68Updated last month
- ☆119Updated 2 months ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆68Updated 6 months ago
- ☆138Updated 4 months ago
- Code Repository of Evaluating Quantized Large Language Models☆132Updated last year
- ☆889Updated 2 weeks ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆118Updated 6 months ago
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆220Updated 2 months ago
- ☆100Updated 4 months ago
- Official implementation of "DPad: Efficient Diffusion Language Models with Suffix Dropout"☆50Updated last month
- A quantization algorithm for LLM☆143Updated last year