PKULab1806 / Fairy-plus-minus-iLinks
Fairy±i (iFairy): Complex-valued Quantization Framework for Large Language Models
☆100Updated last week
Alternatives and similar repositories for Fairy-plus-minus-i
Users that are interested in Fairy-plus-minus-i are comparing it to the libraries listed below
Sorting:
- ☆29Updated 2 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆69Updated 2 months ago
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆179Updated this week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆81Updated 4 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆92Updated 3 weeks ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆67Updated last week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆178Updated this week
- ☆143Updated 2 months ago
- Code release for AdapMoE accepted by ICCAD 2024☆32Updated 4 months ago
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆55Updated 7 months ago
- ☆117Updated 3 months ago
- D^2-MoE: Delta Decompression for MoE-based LLMs Compression☆65Updated 5 months ago
- ☆21Updated last year
- ☆72Updated 10 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆58Updated 10 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆52Updated 9 months ago
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆212Updated 3 weeks ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆402Updated last week
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆148Updated last month
- ☆92Updated 3 months ago
- ☆101Updated 2 weeks ago
- ☆18Updated 5 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆87Updated 5 months ago
- ☆50Updated 3 months ago
- GPU operators for sparse tensor operations☆34Updated last year
- ☆51Updated last year
- Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)☆48Updated 4 months ago
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆24Updated 6 months ago
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆228Updated last month
- Estimate MFU for DeepSeekV3☆24Updated 8 months ago