eth-easl / mixteraLinks
A lightweight, user-friendly data-plane for LLM training.
☆18Updated 2 months ago
Alternatives and similar repositories for mixtera
Users that are interested in mixtera are comparing it to the libraries listed below
Sorting:
- Some microbenchmarks and design docs before commencement☆12Updated 4 years ago
- Compression for Foundation Models☆32Updated 2 months ago
- An Attention Superoptimizer☆21Updated 5 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆66Updated 6 months ago
- ☆14Updated last month
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- A collection of reproducible inference engine benchmarks☆31Updated 2 months ago
- ☆20Updated 2 years ago
- ☆28Updated 4 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆19Updated 11 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 8 months ago
- ☆17Updated 2 years ago
- A curated list for Efficient Large Language Models☆11Updated last year
- How much energy do GenAI models consume?☆44Updated last month
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆64Updated last year
- ☆34Updated last month
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆80Updated 5 months ago
- Distributed ML Optimizer☆32Updated 3 years ago
- ☆23Updated last week
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆21Updated last year
- ☆27Updated last year
- Kinetics: Rethinking Test-Time Scaling Laws☆29Updated this week
- PostText is a QA system for querying your text data. When appropriate structured views are in place, PostText is good at answering querie…☆32Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 8 months ago
- Lightning In-Memory Object Store☆46Updated 3 years ago
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆24Updated last week
- ☆15Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆76Updated last year
- ☆32Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago