Benchmark suite for LLMs from Fireworks.ai
☆95Mar 3, 2026Updated this week
Alternatives and similar repositories for benchmark
Users that are interested in benchmark are comparing it to the libraries listed below
Sorting:
- LLMPerf is a library for validating and benchmarking LLMs☆1,090Dec 9, 2024Updated last year
- Using YouTube to prepare a speech recognition dataset for any language☆10Mar 30, 2021Updated 4 years ago
- Implementation of different noise embeddings for noise aware training of Kaldi acoustic models.☆13Feb 13, 2021Updated 5 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- ☆206May 5, 2025Updated 10 months ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆15Aug 16, 2024Updated last year
- Benchmarking tool for assessing LLM models' performance across different hardwares☆17Dec 8, 2023Updated 2 years ago
- ☆14Nov 3, 2025Updated 4 months ago
- ☆56Nov 18, 2024Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- ☆18Jul 2, 2024Updated last year
- ☆61Sep 17, 2024Updated last year
- A sample pattern for running CI tests on Modal☆19Apr 12, 2025Updated 10 months ago
- ☆18Feb 25, 2026Updated last week
- ☆39Oct 3, 2022Updated 3 years ago
- ☆32Jul 2, 2025Updated 8 months ago
- MLSys competition for the best MOE NKI kernels☆38Updated this week
- ☆20Updated this week
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Easy and Efficient Quantization for Transformers☆206Jan 28, 2026Updated last month
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆281Nov 24, 2025Updated 3 months ago
- LLM Inference benchmark☆432Jul 23, 2024Updated last year
- Source code for Activated LoRA☆24Nov 22, 2025Updated 3 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆48May 10, 2024Updated last year
- Microsoft Collective Communication Library☆66Nov 23, 2024Updated last year
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Updated this week
- This repository contains code for the MicroAdam paper.☆21Dec 14, 2024Updated last year
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,035Sep 4, 2024Updated last year
- ☆120Apr 22, 2024Updated last year
- Serving multiple LoRA finetuned LLM as one☆1,144May 8, 2024Updated last year
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 9 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆28Dec 18, 2024Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆465May 30, 2025Updated 9 months ago
- A handy dataset of noises for ASR☆22May 29, 2019Updated 6 years ago
- Bridge operator repo☆21Sep 17, 2025Updated 5 months ago
- ☆107Feb 25, 2025Updated last year
- ☆21Jun 26, 2024Updated last year
- ☆105Nov 7, 2024Updated last year