πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β138Jul 25, 2024Updated last year
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Jan 5, 2026Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Dec 4, 2025Updated 3 months ago
- an auto-sleeping and -waking framework around llama.cppβ12Feb 8, 2025Updated last year
- Proxy server for triton gRPC server that inferences embedding model in Rustβ21Aug 10, 2024Updated last year
- Python library for automatic training, optimization and comparison of Transformer models on most NLP tasks.β20May 6, 2023Updated 2 years ago
- Quickly and securely turn any Linux box into a build and deployment assistantβ25Oct 3, 2024Updated last year
- Modified Beam Search with periodical restartβ12Sep 12, 2024Updated last year
- Exploring limitations of LLM-as-a-judgeβ20Aug 17, 2024Updated last year
- This repository contains the metadata and data of different databases that we use for testingβ14Jan 29, 2025Updated last year
- Implementation of various Machine learning and MLOps applications/tutorials used within my Medium blog.β11Jan 28, 2023Updated 3 years ago
- Triton implementation of GPT/LLAMAβ21Aug 28, 2024Updated last year
- When real time Yoga Position classification meets GNNβ11Sep 17, 2023Updated 2 years ago
- WebAISum is a Python script that allows you to summarize web pages using AI models. It supports both local models like Ollama and remote β¦β15Apr 28, 2024Updated last year
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β84Oct 29, 2024Updated last year
- Writing Blog Posts with Generative Feedback Loops!β50Mar 19, 2024Updated 2 years ago
- Iterate fast on your RAG pipelinesβ24Jun 21, 2025Updated 9 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decodingβ83Jul 3, 2025Updated 8 months ago
- Demo of an "always-on" AI assistant.β24Feb 14, 2024Updated 2 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β3,958Updated this week
- β120Aug 28, 2024Updated last year
- Notes from our NLP reading club!β18Jul 17, 2021Updated 4 years ago
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasksβ39Jun 10, 2024Updated last year
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.β38Jul 2, 2025Updated 8 months ago
- A guidance compatibility layer for llama-cpp-pythonβ36Sep 11, 2023Updated 2 years ago
- Sales Conversion Optimization MLOps: Boost revenue with AI-powered insights. Features H2O AutoML, ZenML pipelines, Neptune.ai tracking, dβ¦β21Mar 22, 2025Updated last year
- REBUS: A Robust Evaluation Benchmark of Understanding Symbolsβ13Aug 13, 2024Updated last year
- AI_Powered_Dev_Search_Engineβ12Mar 10, 2024Updated 2 years ago
- An ONNX converter script focused on embedding modelsβ33Jan 14, 2025Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,101Jun 30, 2025Updated 8 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β70Nov 17, 2025Updated 4 months ago
- Rust crate for some audio utilitiesβ27Mar 8, 2025Updated last year
- Attend - to what matters.β17Feb 22, 2025Updated last year
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliβ2,724Feb 5, 2026Updated last month
- OpenAI compatible API for TensorRT LLM triton backendβ219Aug 1, 2024Updated last year
- Official implementation of Half-Quadratic Quantization (HQQ)β919Feb 26, 2026Updated 3 weeks ago
- Use `outlines` generators with Haystack.β15Updated this week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ120Mar 31, 2025Updated 11 months ago
- A pytorch quantization backend for optimumβ1,032Nov 21, 2025Updated 4 months ago