JiwenJ / mit6.5940-2023Links

TinyML and Efficient Deep Learning Computing

☆13

Alternatives and similar repositories for mit6.5940-2023

Users that are interested in mit6.5940-2023 are comparing it to the libraries listed below

Sorting:

yifanlu0227 / MIT-6.5940
All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai
☆174Updated last year
yifanlu0227 / LLaMA2-7B-on-laptop
Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.
☆17Updated last year
DefTruth / CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
☆26Updated 2 months ago
SiriusNEO / Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
☆367Updated 6 months ago
KnowingNothing / MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
☆365Updated 9 months ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆197Updated 4 months ago
mit-han-lab / parallel-computing-tutorial
☆170Updated last year
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆487Updated 9 months ago
byungsoo-oh / ml-systems-papers
Curated collection of papers in machine learning systems
☆368Updated 2 weeks ago
October2001 / Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
☆459Updated this week
SJTU-ReArch-Group / Paper-Reading-List
☆110Updated 3 weeks ago
JackonYang / hands-on-tvm
hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.
☆48Updated 2 years ago
microsoft / microxcaling
PyTorch emulation library for Microscaling (MX)-compatible data formats
☆251Updated last week
ifromeast / cuda_learning
learning how CUDA works
☆271Updated 3 months ago
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆255Updated 3 months ago
66RING / tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
☆377Updated last month
goliaro / specinfer-ae
☆21Updated last year
hemingkx / Spec-Bench
Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)
☆281Updated 2 months ago
PrincetonUniversity / LLMCompass
☆160Updated 11 months ago
MoE-Inf / awesome-moe-inference
Curated collection of papers in MoE model inference
☆200Updated 4 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆138Updated 11 months ago
Zhen-Dong / Awesome-Quantization-Papers
List of papers related to neural network quantization in recent AI conferences and journals.
☆658Updated 3 months ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆77Updated 3 weeks ago
reed-lau / cute-gemm
☆123Updated 6 months ago
infinigence / SpecEE
Repo for SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting (ISCA25)
☆40Updated 2 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆397Updated 3 weeks ago
nicolaswilde / cuda-sgemm
☆65Updated 5 months ago
casys-kaist / NeuPIMs
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
☆84Updated last year
Yinghan-Li / YHs_Sample
Yinghan's Code Sample
☆335Updated 2 years ago
DD-DuDa / awesome-vit-quantization-acceleration
List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.
☆91Updated last year