zhuhanqing / APOLLO
APOLLO: SGD-like Memory, AdamW-level Performance
☆150Updated last week
Alternatives and similar repositories for APOLLO:
Users that are interested in APOLLO are comparing it to the libraries listed below
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆239Updated 5 months ago
- The official implementation of MARS: Unleashing the Power of Variance Reduction for Training Large Models☆534Updated last week
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆215Updated 4 months ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆886Updated last month
- ☆111Updated last week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆79Updated 3 months ago
- ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆148Updated 3 months ago
- Official Code for "SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression"☆116Updated 3 weeks ago
- The nanoGPT-style implementation of RWKV Language Model - an RNN with GPT-level LLM performance.☆184Updated last year
- Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆57Updated 2 months ago
- The official implementation of Self-Play Preference Optimization (SPPO)☆481Updated 3 weeks ago
- Support mixed-precsion inference with vllm☆80Updated last month
- code based for rectified flow☆83Updated this week
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆78Updated 2 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆149Updated 2 months ago
- ☆23Updated 3 months ago
- The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆114Updated 2 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆196Updated 3 weeks ago
- Mixed precision inference by Tensorrt-LLM☆76Updated 3 months ago
- This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tas…☆55Updated 6 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆56Updated 3 months ago
- [NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models☆243Updated 2 months ago
- ☆75Updated last month
- 🔥 A minimal training framework for scaling FLA models☆59Updated this week
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆102Updated 4 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆60Updated 10 months ago
- ☆87Updated this week
- Code for Palu: Compressing KV-Cache with Low-Rank Projection☆76Updated this week
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆19Updated 3 weeks ago