JiwenJ / mit6.5940-2023Links
TinyML and Efficient Deep Learning Computing
☆13Updated last year
Alternatives and similar repositories for mit6.5940-2023
Users that are interested in mit6.5940-2023 are comparing it to the libraries listed below
Sorting:
- All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai☆175Updated last year
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆509Updated 10 months ago
- Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.☆17Updated last year
- List of papers related to neural network quantization in recent AI conferences and journals.☆669Updated 3 months ago
- Puzzles for learning Triton, play it with minimal environment configuration!☆416Updated 7 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆365Updated 9 months ago
- flash attention tutorial written in python, triton, cuda, cutlass☆381Updated 2 months ago
- 📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).☆31Updated 2 months ago
- learning how CUDA works☆283Updated 4 months ago
- Development repository for the Triton-Linalg conversion☆190Updated 5 months ago
- ☆23Updated last year
- ☆172Updated last year
- Curated collection of papers in machine learning systems☆384Updated last month
- Examples of CUDA implementations by Cutlass CuTe☆206Updated 2 weeks ago
- Curated collection of papers in MoE model inference☆210Updated 5 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆439Updated 10 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆628Updated this week
- paper and its code for AI System☆316Updated 3 months ago
- Summary of some awesome work for optimizing LLM inference☆84Updated last month
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆259Updated 4 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆717Updated 4 months ago
- ☆113Updated 2 weeks ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆512Updated this week
- 📰 Must-read papers on KV Cache Compression (constantly updating 🤗).☆484Updated 3 weeks ago
- [ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization☆12Updated last week
- This repository contains integer operators on GPUs for PyTorch.☆206Updated last year
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Updated 2 years ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆98Updated last week
- ☆125Updated 7 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆402Updated last month