eth-easl / mixteraLinks

A lightweight, user-friendly data-plane for LLM training.

☆32

Alternatives and similar repositories for mixtera

Users that are interested in mixtera are comparing it to the libraries listed below

Sorting:

tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆68Updated 10 months ago
TobiasNorlund / retro
Official repo to On the Generalization Ability of Retrieval-Enhanced Transformers
☆44Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
DS3Lab / DT-FM
☆93Updated 3 years ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆36Updated last year
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆130Updated 10 months ago
CosimoRulli / emvb
Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024
☆66Updated last week
eth-easl / deltazip
Compression for Foundation Models
☆35Updated 3 months ago
hpcaitech / Elixir
Elixir: Train a Large Language Model on a Small GPU Cluster
☆15Updated 2 years ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 5 months ago
mgmalek / efficient_cross_entropy
☆121Updated last year
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 8 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
UmerHA / triton_util
Make triton easier
☆48Updated last year
samsja / muon_fsdp_2
Muon fsdp 2
☆44Updated 2 months ago
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆120Updated 11 months ago
zhisbug / ray-scalable-ml-design
Some microbenchmarks and design docs before commencement
☆12Updated 4 years ago
xlang-ai / batch-prompting
[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.
☆76Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
Infini-AI-Lab / APE
☆32Updated 8 months ago
RAIVNLab / AdANNS
Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"
☆65Updated 2 years ago
vox-serve / vox-serve
Serving System for SpeechLMs
☆17Updated last week
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆37Updated 6 months ago
stanford-futuredata / stk
☆112Updated last year
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆49Updated last year
deepspeedai / DeepSpeed-Kernels
☆72Updated 7 months ago