linkedlist771 / UCAS-MOOC-AutoWatchLinks
☆20Updated last year
Alternatives and similar repositories for UCAS-MOOC-AutoWatch
Users that are interested in UCAS-MOOC-AutoWatch are comparing it to the libraries listed below
Sorting:
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆59Updated 8 months ago
- ☆17Updated this week
- Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆19Updated 7 months ago
- ☆55Updated last year
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆205Updated last month
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆43Updated 9 months ago
- [COLM 2024] SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models☆24Updated 11 months ago
- 国科大英语慕课学习辅助工具☆13Updated last year
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆16Updated last week
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆59Updated 10 months ago
- The Official Implementation of Ada-KV [NeurIPS 2025]☆95Updated last week
- Code release for AdapMoE accepted by ICCAD 2024☆33Updated 4 months ago
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆70Updated last year
- Implement some method of LLM KV Cache Sparsity☆38Updated last year
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆57Updated 6 months ago
- All-in-one benchmarking platform for evaluating LLM.☆15Updated last month
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆45Updated last year
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning☆103Updated last year
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆113Updated 5 months ago
- analyse problems of AI with Math and Code☆24Updated last month
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆47Updated last month
- ☆143Updated 2 months ago
- ☆15Updated 6 months ago
- Triton multi-level runner, include IR/PTX/cubin.☆54Updated this week
- 16-fold memory access reduction with nearly no loss☆105Updated 6 months ago
- ☆137Updated 2 months ago
- ☆47Updated 11 months ago
- A record of reading list on some MLsys popular topic☆16Updated 6 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆20Updated 9 months ago
- ☆77Updated 11 months ago