Relaxed-System-Lab / COMP6211J_Course_HKUST
β41Updated 4 months ago
Alternatives and similar repositories for COMP6211J_Course_HKUST:
Users that are interested in COMP6211J_Course_HKUST are comparing it to the libraries listed below
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".β98Updated 9 months ago
- Survey Paper List - Efficient LLM and Foundation Modelsβ246Updated 7 months ago
- π° Must-read papers on KV Cache Compression (constantly updating π€).β376Updated 2 weeks ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark oβ¦β72Updated last month
- Official Implementation of "Learning Harmonized Representations for Speculative Sampling" (HASS)β31Updated last month
- A comprehensive guide for beginners in the field of data management and artificial intelligence.β184Updated 2 weeks ago
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β183Updated 2 months ago
- π₯ How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasoninβ¦β40Updated this week
- β38Updated last year
- The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".β331Updated last month
- Curated collection of papers in MoE model inferenceβ150Updated 2 months ago
- β48Updated 4 months ago
- Using LLM to evaluate MMLU dataset.β28Updated last year
- Paper List of Inference/Test Time Scaling/Computingβ195Updated this week
- Code release for AdapMoE accepted by ICCAD 2024β19Updated last month
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β157Updated 6 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!β49Updated 3 weeks ago
- Paper list for Efficient Reasoning.β403Updated this week
- β90Updated 3 months ago
- β99Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ270Updated 5 months ago
- This is the official Python version of CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Actβ¦β16Updated 5 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.β72Updated this week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β258Updated this week
- [NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.β438Updated 8 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]β27Updated last week
- π A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyondβ191Updated this week
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ297Updated 2 weeks ago
- paper and its code for AI Systemβ293Updated last week
- π° Must-read papers and blogs on Speculative Decoding β‘οΈβ696Updated this week