πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β141Jul 25, 2024Updated last year
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- End-to-End Local-First Text-to-SQL Pipelinesβ456Feb 14, 2025Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β60Apr 6, 2026Updated 2 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Dec 4, 2025Updated 6 months ago
- Python library for automatic training, optimization and comparison of Transformer models on most NLP tasks.β20May 6, 2023Updated 3 years ago
- Quickly and securely turn any Linux box into a build and deployment assistantβ25Oct 3, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Modified Beam Search with periodical restartβ12Sep 12, 2024Updated last year
- Exploring limitations of LLM-as-a-judgeβ20Aug 17, 2024Updated last year
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.β26Jun 3, 2024Updated 2 years ago
- This repository contains the metadata and data of different databases that we use for testingβ14Jan 29, 2025Updated last year
- Implementation of various Machine learning and MLOps applications/tutorials used within my Medium blog.β11Jan 28, 2023Updated 3 years ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β37Oct 9, 2025Updated 8 months ago
- WebAISum is a Python script that allows you to summarize web pages using AI models. It supports both local models like Ollama and remote β¦β15Apr 28, 2024Updated 2 years ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β86Oct 29, 2024Updated last year
- LLM-driven automated knowledge graph construction from text using DSPy and Neo4jβ20Aug 19, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- an auto-sleeping and -waking framework around llama.cppβ13Feb 8, 2025Updated last year
- Writing Blog Posts with Generative Feedback Loops!β51Mar 19, 2024Updated 2 years ago
- Iterate fast on your RAG pipelinesβ24Jun 21, 2025Updated 11 months ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for π€ Hugging Face transformer models πβ1,687Oct 23, 2024Updated last year
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decodingβ84Jul 3, 2025Updated 11 months ago
- Demo of an "always-on" AI assistant.β24Feb 14, 2024Updated 2 years ago
- An integration of Qdrant ANN vector database backend with Haystackβ45May 19, 2026Updated 3 weeks ago
- Notes from our NLP reading club!β18Jul 17, 2021Updated 4 years ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabiliβ¦β4,086Updated this week
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A lightweight evaluation suite tailored specifically for assessing Indic LLMs across a diverse range of tasksβ39Jun 10, 2024Updated 2 years ago
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.β38Jul 2, 2025Updated 11 months ago
- A guidance compatibility layer for llama-cpp-pythonβ37Sep 11, 2023Updated 2 years ago
- Sales Conversion Optimization MLOps: Boost revenue with AI-powered insights. Features H2O AutoML, ZenML pipelines, Neptune.ai tracking, dβ¦β21Mar 22, 2025Updated last year
- REBUS: A Robust Evaluation Benchmark of Understanding Symbolsβ13Aug 13, 2024Updated last year
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselorβ47Jan 13, 2026Updated 4 months ago
- This repository contains the source code for running llamaindex tutorials from https://howaibuildthis.substack.com/β41Jan 7, 2024Updated 2 years ago
- An ONNX converter script focused on embedding modelsβ33Jan 14, 2025Updated last year
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.β2,107Jun 30, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPUβ13May 5, 2024Updated 2 years ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β70Nov 17, 2025Updated 6 months ago
- Rust crate for some audio utilitiesβ28Mar 8, 2025Updated last year
- β19Jun 4, 2024Updated 2 years ago
- Attend - to what matters.β17Feb 22, 2025Updated last year
- OpenAI compatible API for TensorRT LLM triton backendβ221Aug 1, 2024Updated last year
- Use `outlines` generators with Haystack.β15Jun 1, 2026Updated last week