anarchy-ai / llm-speed-benchmark
Benchmarking tool for assessing LLM models' performance across different hardwares
☆13Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm-speed-benchmark
- Runner in charge of collecting metrics from LLM inference endpoints for the Unify Hub☆17Updated 9 months ago
- LLM code editor for backend services☆11Updated last month
- The backend behind the LLM-Perf Leaderboard☆10Updated 6 months ago
- Public reports detailing responses to sets of prompts by Large Language Models.☆26Updated last year
- GPU Environment Management for Visual Studio Code☆35Updated last year
- Collection of recipes aiding Gen AI model development☆88Updated last week
- Visualize expert firing frequencies across sentences in the Mixtral MoE model☆17Updated 10 months ago
- Benchmark suite for LLMs from Fireworks.ai☆58Updated 2 weeks ago
- LLM plugin for models hosted by OpenRouter☆68Updated 6 months ago
- A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ☆63Updated last year
- 📡 Deploy AI models and apps to Kubernetes without developing a hernia☆31Updated 5 months ago
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆28Updated 2 months ago
- ☆200Updated 9 months ago
- Transformer GPU VRAM estimator☆40Updated 7 months ago
- Python examples using the bigcode/tiny_starcoder_py 159M model to generate code☆44Updated last year
- Tutorial for building LLM router☆163Updated 4 months ago
- This is a landscape of the infrastructure that powers the generative AI ecosystem☆130Updated last month
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models access…☆114Updated 9 months ago
- Some simple scripts that I use day-to-day when working with LLMs and Huggingface Hub☆155Updated last year
- Generate Glue Code in seconds to simplify your Nvidia Triton Inference Server Deployments☆15Updated 4 months ago
- ReLM is a Regular Expression engine for Language Models☆104Updated last year
- Just a bunch of benchmark logs for different LLMs☆115Updated 3 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆54Updated 7 months ago
- Horizon chart for CPU/GPU/Neural Engine utilization monitoring on Apple M1/M2 and nVidia GPUs on Linux☆24Updated last month
- Google TPU optimizations for transformers models☆75Updated this week
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆23Updated last week
- ☆72Updated last year
- Drop in replacement for OpenAI's embedding API. Self Hosted.☆51Updated last year
- Self-host LLMs with vLLM and BentoML☆74Updated last week
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆28Updated 9 months ago