PKUFlyingPig / MIT6.5940_TinyMLLinks
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
☆47Updated 5 months ago
Alternatives and similar repositories for MIT6.5940_TinyML
Users that are interested in MIT6.5940_TinyML are comparing it to the libraries listed below
Sorting:
- The dataset and baseline code for ASC23 LLM inference optimization challenge.☆34Updated last year
- A PyTorch-like deep learning framework. Just for fun.☆154Updated last year
- Codes & examples for "CUDA - From Correctness to Performance"☆100Updated 7 months ago
- ☆134Updated last month
- Implement Flash Attention using Cute.☆87Updated 6 months ago
- Summary of some awesome work for optimizing LLM inference☆76Updated 2 weeks ago
- This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding co…☆151Updated 4 months ago
- Learning material for CMU10-714: Deep Learning System☆256Updated last year
- [DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"☆53Updated last week
- some hpc project for learning☆22Updated 9 months ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆40Updated 6 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆38Updated last week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆71Updated 2 months ago
- ☆235Updated last week
- Systems for GenAI☆141Updated 2 months ago
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆48Updated 7 months ago
- My solutions to the assignments of CMU 10-714 Deep Learning Systems 2022☆37Updated last year
- ☆78Updated this week
- Implement some method of LLM KV Cache Sparsity☆32Updated last year
- my cs notes☆51Updated 8 months ago
- ☆24Updated last week
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆89Updated 2 months ago
- A practical way of learning Swizzle☆20Updated 4 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆241Updated 5 months ago
- LLM Inference with Deep Learning Accelerator.☆44Updated 5 months ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆48Updated 2 months ago
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆47Updated last week
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆67Updated 2 years ago
- Code release for book "Efficient Training in PyTorch"☆67Updated 2 months ago
- DGEMM on KNL, achieve 75% MKL☆18Updated 3 years ago