AIoT-MLSys-Lab / Efficient-LLMs-Survey
[TMLR 2024] Efficient Large Language Models: A Survey
☆959Updated this week
Related projects: ⓘ
- A curated list for Efficient Large Language Models☆1,119Updated this week
- Awesome LLM compression research papers and tools.☆1,062Updated this week
- A collection of AWESOME things about mixture-of-experts☆920Updated last month
- This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicit…☆534Updated last week
- 📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥☆816Updated this week
- 📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batc…☆2,475Updated this week
- A curated reading list of research in Mixture-of-Experts(MoE).☆520Updated last year
- Fast inference from large lauguage models via speculative decoding☆508Updated 3 weeks ago
- Large Language Model (LLM) Systems Paper List☆572Updated 2 weeks ago
- Official Implementation of EAGLE-1 and EAGLE-2☆749Updated 3 weeks ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆350Updated this week
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training☆849Updated 2 months ago
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆823Updated 3 weeks ago
- An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)☆2,026Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,183Updated 2 months ago
- Aligning Large Language Models with Human: A Survey☆671Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,099Updated 7 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆2,336Updated 2 months ago
- Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton☆1,190Updated this week
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆539Updated 6 months ago
- A simple and effective LLM pruning approach.☆617Updated last month
- FlashInfer: Kernel Library for LLM Serving☆1,143Updated this week
- Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"☆1,041Updated 6 months ago
- An Awesome Collection for LLM Survey☆289Updated last week
- ☆467Updated 2 weeks ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆2,300Updated this week
- Paper List for In-context Learning 🌷☆783Updated 2 months ago
- [ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding☆618Updated last week
- Must-read Papers on Knowledge Editing for Large Language Models.☆829Updated 2 weeks ago
- Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …☆901Updated 2 weeks ago