wasiahmad / Awesome-LLM-Synthetic-DataLinks

A reading list on LLM based Synthetic Data Generation 🔥

☆1,379

Alternatives and similar repositories for Awesome-LLM-Synthetic-Data

Users that are interested in Awesome-LLM-Synthetic-Data are comparing it to the libraries listed below

Sorting:

trotsky1997 / MathBlackBox
☆1,028Updated 7 months ago
EdinburghNLP / awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
☆930Updated last month
tencent-ailab / persona-hub
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
☆1,265Updated 5 months ago
huggingface / evaluation-guidebook
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…
☆1,498Updated 6 months ago
maitrix-org / llm-reasoners
A library for advanced large language model reasoning
☆2,193Updated last month
coree / awesome-rag
A curated list of retrieval-augmented generation (RAG) in large language models
☆295Updated 5 months ago
bespokelabsai / curator
Synthetic data curation for post-training and structured data extraction
☆1,468Updated last week
sunnynexus / Search-o1
Search-o1: Agentic Search-Enhanced Large Reasoning Models
☆995Updated 2 months ago
huggingface / search-and-learn
Recipes to scale inference-time compute of open models
☆1,110Updated 2 months ago
huggingface / lighteval
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆1,793Updated this week
Agent-RL / ReCall
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
☆1,116Updated 2 months ago
GAIR-NLP / LIMO
[COLM 2025] LIMO: Less is More for Reasoning
☆993Updated last week
mbzuai-oryx / Awesome-LLM-Post-training
Awesome Reasoning LLM Tutorial/Survey/Guide
☆1,929Updated 3 weeks ago
argilla-io / distilabel
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆2,833Updated last week
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆744Updated 4 months ago
philschmid / deep-learning-pytorch-huggingface
☆1,256Updated 5 months ago
leobeeson / llm_benchmarks
A collection of benchmarks and datasets for evaluating LLM.
☆483Updated last year
llm-as-a-judge / Awesome-LLM-as-a-judge
☆388Updated last week
predibase / llm_distillation_playbook
Best practices for distilling large language models.
☆569Updated last year
prometheus-eval / prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
☆978Updated 3 months ago
lmmlzn / Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
☆1,329Updated 4 months ago
Curated-Awesome-Lists / awesome-llms-fine-tuning
Explore a comprehensive collection of resources, tutorials, papers, tools, and best practices for fine-tuning Large Language Models (LLMs…
☆448Updated 8 months ago
Open-Source-O1 / Open-O1
☆1,355Updated 8 months ago
FudanDNN-NLP / RAG
This is an implementation of the paper: Searching for Best Practices in Retrieval-Augmented Generation (EMNLP2024)
☆329Updated 7 months ago
Xnhyacinth / Awesome-LLM-Long-Context-Modeling
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
☆1,640Updated 2 weeks ago
huggingface / huggingface-llama-recipes
☆677Updated 3 months ago
AIDC-AI / Marco-o1
An Open Large Reasoning Model for Real-World Solutions
☆1,510Updated 2 months ago
GAIR-NLP / O1-Journey
O1 Replication Journey
☆1,998Updated 6 months ago
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,224Updated 6 months ago
open-thought / system-2-research
System 2 Reasoning Link Collection
☆849Updated 4 months ago