premAI-io / benchmarksLinks
πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β137Updated 10 months ago
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- experiments with inference on llamaβ104Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β137Updated last month
- β124Updated 2 months ago
- β199Updated last year
- Manage scalable open LLM inference endpoints in Slurm clustersβ260Updated 11 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ264Updated 8 months ago
- Inference server benchmarking toolβ73Updated last month
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β223Updated 7 months ago
- Let's build better datasets, together!β259Updated 6 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β33Updated last month
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β77Updated 8 months ago
- Set of scripts to finetune LLMsβ37Updated last year
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ231Updated 7 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β304Updated 3 weeks ago
- Simple UI for debugging correlations of text embeddingsβ276Updated 3 weeks ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β198Updated 11 months ago
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, includingβ¦β56Updated 2 months ago
- Comparison of Language Model Inference Enginesβ217Updated 6 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated 2 months ago
- OpenAI compatible API for TensorRT LLM triton backendβ209Updated 10 months ago
- β152Updated 6 months ago
- A Lightweight Library for AI Observabilityβ245Updated 4 months ago
- β132Updated 10 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 8 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.β131Updated last month
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ207Updated last month
- β267Updated last week
- Low-Rank adapter extraction for fine-tuned transformers modelsβ173Updated last year
- Evaluation of bm42 sparse indexing algorithmβ68Updated 11 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated last week