premAI-io / benchmarksLinks
πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β139Updated last year
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- β198Updated last year
- experiments with inference on llamaβ103Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β153Updated 4 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β32Updated 2 months ago
- β138Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ277Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ232Updated last year
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β78Updated last year
- β210Updated 5 months ago
- Self-host LLMs with vLLM and BentoMLβ161Updated last week
- Let's build better datasets, together!β265Updated 11 months ago
- Fine-tune an LLM to perform batch inference and online serving.β114Updated 6 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β51Updated last year
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated last month
- Efficient vector database for hundred millions of embeddings.β211Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β243Updated last year
- An innovative library for efficient LLM inference via low-bit quantizationβ350Updated last year
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K β¦β83Updated 11 months ago
- A Lightweight Library for AI Observabilityβ251Updated 9 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ292Updated 8 months ago
- β51Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated last year
- Google TPU optimizations for transformers modelsβ123Updated 10 months ago
- A collection of all available inference solutions for the LLMsβ93Updated 9 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β146Updated 2 years ago
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ262Updated this week
- Comparison of Language Model Inference Enginesβ236Updated 11 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β88Updated 2 weeks ago
- β124Updated last year