Motsepe-Jr / AI-research-papers-pseudo-codeLinks

This is a repo covers ai research papers pseudocodes

☆17

Alternatives and similar repositories for AI-research-papers-pseudo-code

Users that are interested in AI-research-papers-pseudo-code are comparing it to the libraries listed below

Sorting:

jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆98Updated 2 years ago
lucidrains / speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
☆288Updated 10 months ago
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
MDK8888 / vllmini
A minimal implementation of vllm.
☆60Updated last year
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆301Updated 2 months ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 4 months ago
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆125Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆264Updated last month
gnovack / distributed-training-and-deepspeed
☆17Updated 2 years ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆461Updated 6 months ago
huggingface / picotron_tutorial
☆225Updated 2 weeks ago
gpu-mode / ring-attention
ring-attention experiments
☆155Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated this week
qhliu26 / Dive-into-Big-Model-Training
📑 Dive into Big Model Training
☆114Updated 2 years ago
haochengxi / Train_Transformers_with_INT4
☆156Updated 2 years ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆223Updated last year
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆147Updated last year
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆388Updated last week
mgmalek / efficient_cross_entropy
☆121Updated last year
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆64Updated this week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆247Updated last month
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆270Updated 3 months ago
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆323Updated last year
gpu-mode / profiling-cuda-in-torch
☆174Updated last year
Deep-Learning-Profiling-Tools / triton-viz
☆246Updated this week
nbasyl / LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
☆217Updated last year
melisa-writer / short-transformers
Prune transformer layers
☆69Updated last year
SqueezeAILab / KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆389Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆159Updated 7 months ago