wasiahmad / Awesome-LLM-Synthetic-Data
A reading list on LLM based Synthetic Data Generation π₯
β993Updated 2 months ago
Alternatives and similar repositories for Awesome-LLM-Synthetic-Data:
Users that are interested in Awesome-LLM-Synthetic-Data are comparing it to the libraries listed below
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard aβ¦β971Updated 3 weeks ago
- Recipes to scale inference-time compute of open modelsβ975Updated last week
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"β991Updated 4 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backendsβ1,022Updated this week
- β997Updated last month
- Evaluate your LLM's response with Prometheus and GPT4 π―β854Updated 3 weeks ago
- System 2 Reasoning Link Collectionβ751Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β2,064Updated this week
- Official repository for ICLR 2025 paper "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient anβ¦β587Updated last week
- β2,341Updated this week
- A bibliography and survey of the papers surrounding o1β1,076Updated 2 months ago
- β489Updated 2 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.β2,017Updated this week
- A library for advanced large language model reasoningβ1,690Updated this week
- β594Updated last month
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuningβ347Updated 4 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).β791Updated this week
- Minimalistic large language model 3D-parallelism trainingβ1,400Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.β1,261Updated last week
- Large Reasoning Modelsβ801Updated last month
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. β π€π€β910Updated this week
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.β2,167Updated this week
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineeringβ601Updated 2 weeks ago
- β868Updated this week
- An Open Large Reasoning Model for Real-World Solutionsβ1,411Updated 2 months ago
- List of papers on hallucination detection in LLMs.β750Updated last month
- Official repository for ORPOβ432Updated 7 months ago
- Best practices for distilling large language models.β431Updated 11 months ago
- Bringing BERT into modernity via both architecture changes and scalingβ1,119Updated last week
- Summarize existing representative LLMs text datasets.β1,141Updated last month