linkedlist771 / UCAS-MOOC-AutoWatchLinks
☆21Updated last year
Alternatives and similar repositories for UCAS-MOOC-AutoWatch
Users that are interested in UCAS-MOOC-AutoWatch are comparing it to the libraries listed below
Sorting:
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆51Updated 6 months ago
- analyse problems of AI with Math and Code☆18Updated last week
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆172Updated last week
- 高级计算机体系结构2020,吴俊敏老师,中科大研究生课程☆72Updated last year
- Code release for AdapMoE accepted by ICCAD 2024☆29Updated 3 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆42Updated 7 months ago
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆11Updated last month
- Implement some method of LLM KV Cache Sparsity☆35Updated last year
- The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference☆87Updated last month
- Pytorch implementation of our paper accepted by ICML 2024 -- CaM: Cache Merging for Memory-efficient LLMs Inference☆42Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆51Updated 9 months ago
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆64Updated last month
- ☆140Updated last month
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆61Updated last year
- 智能计算系统 AI Computing Systems 陈云霁☆164Updated 2 years ago
- ☆54Updated last year
- ☆75Updated 9 months ago
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆102Updated 3 months ago
- ☆125Updated 3 weeks ago
- 国科大《智能计算系统》课程实验☆23Updated last year
- A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of …☆238Updated last month
- ☆13Updated 5 months ago
- ☆54Updated 8 months ago
- 清华大学《计算机组成原理》大实验——五级流水线 RISC-V 处理器。「奋战三星期,造台计算机」☆18Updated 2 years ago
- Long short token decoding speed up 4x for long context LLM. A hundred lines of core code. Open source for learning.☆8Updated last year
- A Manual on Surviving in CS of NWPU☆52Updated last year
- 21-22春 智能计算系统实验 中国科学院大学☆33Updated 3 years ago
- Explore Inter-layer Expert Affinity in MoE Model Inference