πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β140Jul 25, 2024Updated last year
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- End-to-End Local-First Text-to-SQL Pipelinesβ453Feb 14, 2025Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Apr 6, 2026Updated 3 weeks ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Dec 4, 2025Updated 4 months ago
- Quickly and securely turn any Linux box into a build and deployment assistantβ25Oct 3, 2024Updated last year
- Learning and rediscovering ML from total scratchβ12Aug 30, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off β’ AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Modified Beam Search with periodical restartβ12Sep 12, 2024Updated last year
- Exploring limitations of LLM-as-a-judgeβ20Aug 17, 2024Updated last year
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.β26Jun 3, 2024Updated last year
- This repository contains the metadata and data of different databases that we use for testingβ14Jan 29, 2025Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β37Oct 9, 2025Updated 6 months ago
- Triton implementation of GPT/LLAMAβ21Aug 28, 2024Updated last year
- WebAISum is a Python script that allows you to summarize web pages using AI models. It supports both local models like Ollama and remote β¦β15Apr 28, 2024Updated 2 years ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β85Oct 29, 2024Updated last year
- LLM-driven automated knowledge graph construction from text using DSPy and Neo4jβ19Aug 19, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- an auto-sleeping and -waking framework around llama.cppβ12Feb 8, 2025Updated last year
- Writing Blog Posts with Generative Feedback Loops!β50Mar 19, 2024Updated 2 years ago
- Iterate fast on your RAG pipelinesβ24Jun 21, 2025Updated 10 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decodingβ83Jul 3, 2025Updated 9 months ago
- Demo of an "always-on" AI assistant.β24Feb 14, 2024Updated 2 years ago
- An integration of Qdrant ANN vector database backend with Haystackβ45Apr 6, 2026Updated 3 weeks ago
- β121Mar 18, 2026Updated last month
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β4,036Updated this week
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasksβ39Jun 10, 2024Updated last year
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.β38Jul 2, 2025Updated 10 months ago
- REBUS: A Robust Evaluation Benchmark of Understanding Symbolsβ13Aug 13, 2024Updated last year
- Quantized inference code for LLaMA modelsβ13Mar 12, 2023Updated 3 years ago
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselorβ38Jan 13, 2026Updated 3 months ago
- AI_Powered_Dev_Search_Engineβ12Mar 10, 2024Updated 2 years ago
- This repository contains the source code for running llamaindex tutorials from https://howaibuildthis.substack.com/β41Jan 7, 2024Updated 2 years ago
- An ONNX converter script focused on embedding modelsβ33Jan 14, 2025Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,108Jun 30, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPUβ13May 5, 2024Updated last year
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β70Nov 17, 2025Updated 5 months ago
- β19Jun 4, 2024Updated last year
- OpenAI compatible API for TensorRT LLM triton backendβ220Aug 1, 2024Updated last year
- Use `outlines` generators with Haystack.β15Updated this week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ121Mar 31, 2025Updated last year
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpaliβ2,773Mar 24, 2026Updated last month