premAI-io / benchmarks
πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β136Updated 7 months ago
Alternatives and similar repositories for benchmarks:
Users that are interested in benchmarks are comparing it to the libraries listed below
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ224Updated this week
- experiments with inference on llamaβ104Updated 9 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ253Updated 8 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β125Updated 3 months ago
- β199Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 5 months ago
- β113Updated 5 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ230Updated 4 months ago
- Comparison of Language Model Inference Enginesβ208Updated 3 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ83Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated this week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β154Updated 5 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β289Updated last month
- Self-host LLMs with vLLM and BentoMLβ94Updated this week
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β173Updated 6 months ago
- Low-Rank adapter extraction for fine-tuned transformers modelsβ171Updated 10 months ago
- OpenAI compatible API for TensorRT LLM triton backendβ201Updated 7 months ago
- This repository contains the code for dataset curation and finetuning of instruct variant of the Bilingual OpenHathi model. The resultinβ¦β23Updated last year
- An innovative library for efficient LLM inference via low-bit quantizationβ351Updated 6 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β196Updated 8 months ago
- β208Updated 8 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β58Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β207Updated 4 months ago
- Vector Database with support for late interaction and token level embeddings.β53Updated 5 months ago
- β237Updated last week
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β73Updated 5 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'β233Updated 9 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β147Updated last year
- End-to-End LLM Guideβ104Updated 8 months ago
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ101Updated last week