Relaxed-System-Lab / COMP6211J_Course_HKUSTLinks
☆41Updated 6 months ago
Alternatives and similar repositories for COMP6211J_Course_HKUST
Users that are interested in COMP6211J_Course_HKUST are comparing it to the libraries listed below
Sorting:
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆105Updated 11 months ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆154Updated this week
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆459Updated this week
- Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.☆321Updated 3 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆120Updated this week
- ☆52Updated 6 months ago
- Paper list for Efficient Reasoning.☆509Updated this week
- ☆18Updated 3 months ago
- Survey Paper List - Efficient LLM and Foundation Models☆248Updated 9 months ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**☆194Updated 4 months ago
- Paper List of Inference/Test Time Scaling/Computing☆264Updated last week
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆90Updated 2 months ago
- ☆21Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆297Updated 7 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆110Updated 4 months ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆88Updated 2 years ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆79Updated 4 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆233Updated 2 weeks ago
- The repo for In-context Autoencoder☆128Updated last year
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.☆454Updated 10 months ago
- A subjective learning guide for generative AI research☆82Updated 10 months ago
- 🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasonin…☆49Updated last month
- All-in-one benchmarking platform for evaluating LLM.☆15Updated this week
- Curated collection of papers in MoE model inference☆197Updated 4 months ago
- [arXiv 2025] Efficient Reasoning Models: A Survey☆184Updated this week
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models☆464Updated last week
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Act…☆16Updated 8 months ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆252Updated 2 weeks ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆156Updated 3 months ago