wasiahmad / Awesome-LLM-Synthetic-Data
A reading list on LLM based Synthetic Data Generation π₯
β1,176Updated last week
Alternatives and similar repositories for Awesome-LLM-Synthetic-Data:
Users that are interested in Awesome-LLM-Synthetic-Data are comparing it to the libraries listed below
- β1,006Updated 2 months ago
- A library for advanced large language model reasoningβ2,000Updated last week
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β1,049Updated last month
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,517Updated this week
- An Open Large Reasoning Model for Real-World Solutionsβ1,465Updated 3 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,238Updated this week
- Recipes to scale inference-time compute of open modelsβ1,019Updated last week
- Synthetic data curation for post-training and structured data extractionβ901Updated this week
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"β1,048Updated last week
- Large Reasoning Modelsβ800Updated 3 months ago
- β1,338Updated 3 months ago
- Summarize existing representative LLMs text datasets.β1,192Updated 2 months ago
- Search-o1: Agentic Search-Enhanced Large Reasoning Modelsβ663Updated this week
- A bibliography and survey of the papers surrounding o1β1,172Updated 3 months ago
- Curated list of datasets and tools for post-training.β2,766Updated last month
- List of papers on hallucination detection in LLMs.β785Updated last week
- Best practices for distilling large language models.β491Updated last year
- AllenAI's post-training codebaseβ2,733Updated this week
- Evaluate your LLM's response with Prometheus and GPT4 π―β877Updated last month
- O1 Replication Journeyβ1,964Updated last month
- π° Must-read papers and blogs on LLM based Long Context Modeling π₯β1,286Updated this week
- Official repository for ICLR 2025 paper "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient anβ¦β638Updated 2 weeks ago
- System 2 Reasoning Link Collectionβ806Updated 3 weeks ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).β808Updated this week
- β899Updated last month
- Training Large Language Model to Reason in a Continuous Latent Spaceβ926Updated last month
- Automatic evals for LLMsβ304Updated this week