davanstrien / awesome-synthetic-datasets
awesome synthetic (text) datasets
☆242Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for awesome-synthetic-datasets
- ☆451Updated 3 weeks ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- An Open Source Toolkit For LLM Distillation☆356Updated 2 months ago
- Let's build better datasets, together!☆205Updated this week
- ☆105Updated 2 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆180Updated 3 weeks ago
- A compact LLM pretrained in 9 days by using high quality data☆262Updated last month
- Toolkit for attaching, training, saving and loading of new heads for transformer models☆246Updated 2 weeks ago
- The official evaluation suite and dynamic data release for MixEval.☆224Updated last week
- ☆93Updated last month
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆196Updated 6 months ago
- Automatically evaluate your LLMs in Google Colab☆559Updated 6 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated 3 weeks ago
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality s…☆491Updated 2 weeks ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆811Updated this week
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆116Updated last year
- A simple unified framework for evaluating LLMs☆145Updated last week
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆408Updated 2 months ago
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"☆175Updated 2 weeks ago
- This is an implementation of the paper: Searching for Best Practices in Retrieval-Augmented Generation☆215Updated last month
- Generative Representational Instruction Tuning☆567Updated this week
- A bagel, with everything.☆312Updated 7 months ago
- Easily embed, cluster and semantically label text datasets☆462Updated 7 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆147Updated last month
- This is the reproduction repository for my 🤗 Hugging Face blog post on synthetic data☆61Updated 9 months ago
- ☆131Updated 4 months ago
- Official repository for ORPO☆421Updated 5 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆103Updated last month
- ☆129Updated 3 weeks ago
- AWM: Agent Workflow Memory☆205Updated last month