premAI-io / benchmarks
πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
β136Updated 8 months ago
Alternatives and similar repositories for benchmarks:
Users that are interested in benchmarks are comparing it to the libraries listed below
- experiments with inference on llamaβ104Updated 10 months ago
- β199Updated last year
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β210Updated 5 months ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β73Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ254Updated 9 months ago
- β112Updated this week
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for freeβ231Updated 5 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching oβ¦β128Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ319Updated 5 months ago
- Efficient vector database for hundred millions of embeddings.β205Updated 10 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, impβ¦β174Updated 7 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ86Updated last month
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ248Updated this week
- Inference server benchmarking toolβ48Updated last week
- FastFit β‘ When LLMs are Unfit Use FastFit β‘ Fast and Effective Text Classification with Many Classesβ193Updated 6 months ago
- β209Updated 9 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'β233Updated 10 months ago
- β122Updated 5 months ago
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytesβ¦β147Updated last year
- XTR/WARP is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.β122Updated 5 months ago
- Self-host LLMs with vLLM and BentoMLβ100Updated this week
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.β82Updated last year
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 6 months ago
- Comparison of Language Model Inference Enginesβ212Updated 3 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created byβ¦β30Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β195Updated 8 months ago
- β57Updated 2 weeks ago
- A flexible, adaptive classification system for dynamic text classificationβ150Updated last month
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"β153Updated 5 months ago