fixie-ai / ai-benchmarks
Benchmarking suite for popular AI APIs
☆77Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for ai-benchmarks
- Website with current metrics on the fastest AI models.☆36Updated last week
- Self-host LLMs with vLLM and BentoML☆74Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆165Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆58Updated 2 weeks ago
- Just a bunch of benchmark logs for different LLMs☆115Updated 3 months ago
- Synthetic Data for LLM Fine-Tuning☆97Updated 11 months ago
- Routing on Random Forest (RoRF)☆84Updated last month
- Collection of recipes aiding Gen AI model development☆88Updated last week
- ☆120Updated this week
- A pipeline for LLM knowledge distillation☆78Updated 3 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆55Updated 3 months ago
- ☆149Updated 4 months ago
- an implementation of Self-Extend, to expand the context window via grouped attention☆118Updated 10 months ago
- ☆64Updated 5 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆137Updated last month
- ☆53Updated 5 months ago
- Simple examples using Argilla tools to build AI☆40Updated this week
- Expert Specialized Fine-Tuning☆145Updated last month
- A toolkit for building multimodal AI agents☆111Updated this week
- Tutorial for building LLM router☆163Updated 4 months ago
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆36Updated 7 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago
- Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)☆74Updated 2 months ago
- ☆94Updated 2 months ago
- Data preparation code for Amber 7B LLM☆83Updated 6 months ago
- Self-hosted LLM chatbot arena, with yourself as the only judge☆36Updated 9 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆81Updated this week
- A benchmark for emotional intelligence in large language models☆197Updated 3 months ago
- A framework for evaluating function calls made by LLMs☆35Updated 3 months ago