premAI-io / benchmarksLinks
πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β137Updated 11 months ago
Alternatives and similar repositories for benchmarks
Users that are interested in benchmarks are comparing it to the libraries listed below
Sorting:
- experiments with inference on llamaβ104Updated last year
- β198Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β138Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMsβ264Updated 9 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ262Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β33Updated 2 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated this week
- β210Updated 2 weeks ago
- Let's build better datasets, together!β260Updated 6 months ago
- β127Updated 3 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ232Updated 8 months ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β77Updated 8 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β146Updated last year
- TitanML Takeoff Server is an optimization, compression and deployment platform that makes state of the art machine learning models accessβ¦β114Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first appβ¦β168Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ105Updated 3 months ago
- Toolkit for attaching, training, saving and loading of new heads for transformer modelsβ282Updated 4 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creationβ110Updated 9 months ago
- Multi-threaded matrix multiplication and cosine similarity calculations for dense and sparse matrices. Appropriate for calculating the K β¦β83Updated 6 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 9 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'β239Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β229Updated 8 months ago
- Comparison of Language Model Inference Enginesβ219Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ323Updated 8 months ago
- An innovative library for efficient LLM inference via low-bit quantizationβ349Updated 10 months ago
- Self-host LLMs with vLLM and BentoMLβ133Updated last week
- OpenAI compatible API for TensorRT LLM triton backendβ209Updated 11 months ago
- Fine-tune an LLM to perform batch inference and online serving.β112Updated last month
- FineTune LLMs in few lines of code (Text2Text, Text2Speech, Speech2Text)β240Updated last year
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welcβ¦β22Updated last year