mlc-ai / packageLinks

☆13

Alternatives and similar repositories for package

Users that are interested in package are comparing it to the libraries listed below

Sorting:

nkypy / candle-rwkv
RWKV models and examples powered by candle.
☆19Updated 6 months ago
cryscan / web-rwkv
Implementation of the RWKV language model in pure WebGPU/Rust.
☆314Updated 2 weeks ago
huggingface / optimum-amd
AMD related optimizations for transformer models
☆83Updated 2 weeks ago
abetlen / ggml-python
Python bindings for ggml
☆146Updated last year
chu-tianxiang / llama-cpp-torch
llama.cpp to PyTorch Converter
☆34Updated last year
xorbitsai / xllamacpp
xllamacpp - a Python wrapper of llama.cpp
☆52Updated last week
mlc-ai / relax
☆163Updated this week
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆153Updated this week
ngxson / ggml-easy
Thin wrapper around GGML to make life easier
☆40Updated 2 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆89Updated this week
justinchuby / onnx-safetensors
Use safetensors with ONNX 🤗
☆69Updated 2 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆175Updated last week
SJTU-IPADS / Bamboo
Bamboo-7B Large Language Model
☆93Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated 10 months ago
mlc-ai / llm-perf-bench
☆120Updated last year
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 7 months ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆191Updated last year
lapp0 / lm-inference-engines
Comparison of Language Model Inference Engines
☆229Updated 8 months ago
powerserve-project / PowerServe
High-speed and easy-use LLM serving framework for local deployment
☆117Updated 3 weeks ago
woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆168Updated last week
mag- / gpu_benchmark
Gpu benchmark
☆67Updated 7 months ago
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆89Updated 3 months ago
facebookresearch / fastgen
Simple high-throughput inference library
☆127Updated 3 months ago
ROCm / flash-attention
Fast and memory-efficient exact attention
☆184Updated this week
cahya-wirawan / rwkv-tokenizer
A fast RWKV Tokenizer written in Rust
☆52Updated 3 weeks ago
zhuzilin / faster-nougat
Implementation of nougat that focuses on processing pdf locally.
☆81Updated 7 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆58Updated last year
unslothai / llama.cpp
LLM inference in C/C++
☆101Updated last week
hpcaitech / CachedEmbedding
A memory efficient DLRM training solution using ColossalAI
☆106Updated 2 years ago
ROCm / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆94Updated this week