sambanova / modelzooLinks

The SambaNova Model Zoo open-source repository includes RDU-compatible source code, along with example applications for compiling and running models on SambaNova hardware.

☆16

Alternatives and similar repositories for modelzoo

Users that are interested in modelzoo are comparing it to the libraries listed below

Sorting:

IST-DASLab / Quartet
☆77Updated last month
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
HanGuo97 / lq-lora
☆127Updated last year
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆124Updated 8 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆93Updated last week
Cornell-RelaxML / qtip
☆146Updated last month
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆109Updated 4 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆109Updated 9 months ago
apple / ml-recurrent-drafter
☆216Updated 6 months ago
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated 10 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆151Updated 4 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆50Updated this week
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆95Updated last year
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆319Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆57Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆153Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆258Updated 2 weeks ago
FasterDecoding / TEAL
☆137Updated 5 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated this week
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆115Updated 8 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆176Updated 5 months ago
SqueezeAILab / KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆367Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
ROCm / pyrsmi
python package of rocm-smi-lib
☆22Updated 3 weeks ago
HazyResearch / train-tk
train with kittens!
☆62Updated 9 months ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆118Updated 6 months ago
gpu-mode / ring-attention
ring-attention experiments
☆147Updated 9 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆88Updated this week
thunlp / TritonBench
TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators
☆73Updated last month