☆56Nov 18, 2024Updated last year
Alternatives and similar repositories for llm-bench
Users that are interested in llm-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- kernel development code for my work (ioatdma, ntb_hw_intel, idxd, PCI, and CXL related bits)☆12Jan 19, 2026Updated 4 months ago
- Self-host LLMs with vLLM and BentoML☆169Mar 3, 2026Updated 3 months ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆134Feb 22, 2024Updated 2 years ago
- ☆27Apr 23, 2026Updated last month
- Python bindings for NVIDIA CUDA APIs.☆13Mar 2, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- High-performance GEMM implementation optimized for NVIDIA H100 GPUs, leveraging Hopper architecture's TMA, WGMMA, and Thread Block Cluste…☆11Dec 4, 2024Updated last year
- Tutorial Exercises and Code for GPU Communications Tutorial at HOT Interconnects 2025☆32Oct 22, 2025Updated 7 months ago
- Dicom ECG from OHIF 2 Viewer (Extension)☆12Jun 30, 2023Updated 2 years ago
- ☆16May 14, 2025Updated last year
- Converting night into day is one of the most interesting applications in generative models, due to the great difficulty in recreating the…☆12Oct 13, 2023Updated 2 years ago
- ☆10Feb 17, 2026Updated 4 months ago
- Python tools☆14Oct 22, 2023Updated 2 years ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- All notebook for FastAI learning purposes.☆15Jun 11, 2019Updated 7 years ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks" [ICLR 2025]☆29Feb 20, 2026Updated 3 months ago
- Comparison of Language Model Inference Engines☆241Dec 16, 2024Updated last year
- Benchmark suite for LLMs from Fireworks.ai☆105Jun 11, 2026Updated last week
- Linux tree for ntrdma driver development.☆11Jun 29, 2017Updated 8 years ago
- A cross-platform and editor-agnostic live previewer for Markdown files☆11Jul 15, 2024Updated last year
- ☆15Feb 12, 2026Updated 4 months ago
- Qwen3-0.6B megakernel: 527 tok/s decode on RTX 3090 (3.8x faster than PyTorch)☆110Feb 10, 2026Updated 4 months ago
- LLM Serving Performance Evaluation Harness☆84Feb 25, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A sample pattern for running CI tests on Modal☆19Apr 12, 2025Updated last year
- Tools for MPI programmers☆14Sep 21, 2020Updated 5 years ago
- ☆62May 4, 2024Updated 2 years ago
- LLMPerf is a library for validating and benchmarking LLMs☆1,120Dec 9, 2024Updated last year
- ☆22Jan 23, 2024Updated 2 years ago
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆18Dec 22, 2023Updated 2 years ago
- a cool shell script for git keyword expansion.☆16Sep 11, 2014Updated 11 years ago
- Pocket Survival Guide for Sys Admin - http://psg.skinforum.org/ -☆15Jun 1, 2026Updated 2 weeks ago
- Jax implementation of the AdaHessian optimizer☆19Mar 11, 2021Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The code runs on the netronome smart card to filtering PPPoE and PPP control plane packet send to vbras and Decap\Encap data plane packet…☆11Jun 21, 2017Updated 8 years ago
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training☆24Mar 1, 2024Updated 2 years ago
- A calculator to estimate the memory footprint, capacity, and latency on VMware Private AI with NVIDIA.☆40Aug 5, 2025Updated 10 months ago
- these are custom recipes of nvidia nsight system post collection analysis.☆16Nov 7, 2025Updated 7 months ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated 2 years ago
- A GAN-based system that transforms dark, poorly lit images into well-illuminated versions. Using a custom encoder-decoder architecture, i…☆15May 22, 2025Updated last year
- Terraform modules for deploying DAOS on GCP☆11Jan 17, 2024Updated 2 years ago