mlabonne / llm-autoeval
Automatically evaluate your LLMs in Google Colab
☆603Updated 10 months ago
Alternatives and similar repositories for llm-autoeval:
Users that are interested in llm-autoeval are comparing it to the libraries listed below
- Evaluate your LLM's response with Prometheus and GPT4 💯☆883Updated this week
- Official repository for ORPO☆445Updated 9 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,313Updated this week
- awesome synthetic (text) datasets☆265Updated 4 months ago
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆691Updated 11 months ago
- Banishing LLM Hallucinations Requires Rethinking Generalization☆272Updated 8 months ago
- ☆501Updated 4 months ago
- Best practices for distilling large language models.☆506Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆498Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆414Updated last year
- Automated Evaluation of RAG Systems☆562Updated 4 months ago
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆454Updated 6 months ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆298Updated last year
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆265Updated 2 weeks ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆316Updated 4 months ago
- An Open Source Toolkit For LLM Distillation☆544Updated 2 months ago
- ☆412Updated last year
- A bagel, with everything.☆317Updated 11 months ago
- Fully fine-tune large models like Mistral, Llama-2-13B, or Qwen-14B completely for free☆230Updated 4 months ago
- ☆374Updated 2 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆489Updated last year
- ☆838Updated 6 months ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆709Updated last year
- Data and tools for generating and inspecting OLMo pre-training data.☆1,162Updated last week
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆646Updated 9 months ago
- Automatic evals for LLMs☆334Updated this week
- Fast & more realistic evaluation of chat language models. Includes leaderboard.☆185Updated last year
- ☆512Updated 7 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆302Updated last month
- Let's build better datasets, together!☆257Updated 3 months ago