a2677331 / Stanford-CS142Links
Stanford CS142 Web-Applications
☆6Updated last year
Alternatives and similar repositories for Stanford-CS142
Users that are interested in Stanford-CS142 are comparing it to the libraries listed below
Sorting:
- Curated collection of papers in MoE model inference☆213Updated 5 months ago
- Advanced Scalable Systems for X☆37Updated 7 months ago
- My Paper Reading Lists and Notes.☆20Updated 6 months ago
- Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing☆51Updated 6 months ago
- ☆169Updated last year
- ☆47Updated last year
- ☆15Updated 7 months ago
- Solution of Programming Massively Parallel Processors☆47Updated last year
- ☆123Updated last week
- ☆42Updated 8 months ago
- Systems for GenAI☆142Updated 3 months ago
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023☆32Updated 2 years ago
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆143Updated last year
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆42Updated 7 months ago
- ☆15Updated 11 months ago
- [HPCA'24] Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System☆46Updated this week
- kvcached: Elastic KV cache for dynamic GPU sharing and efficient multi-LLM inference.☆25Updated this week
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆40Updated last year
- Codes & examples for "CUDA - From Correctness to Performance"☆102Updated 9 months ago
- ☆114Updated 3 weeks ago
- ☆38Updated last week
- Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.☆17Updated last year
- NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆47Updated last month
- Code release for AdapMoE accepted by ICCAD 2024☆26Updated 2 months ago
- ☆12Updated 3 years ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆258Updated 6 months ago
- GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving☆17Updated 2 weeks ago
- ☆48Updated last year
- Large Language Model (LLM) Serving Paper and Resource List☆24Updated 2 months ago
- Github repository of HPCA 2025 paper "UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures"☆13Updated 7 months ago