MoritzLaurer / synthetic-data-blog
This is the reproduction repository for my π€ Hugging Face blog post on synthetic data
β61Updated 9 months ago
Related projects β
Alternatives and complementary repositories for synthetic-data-blog
- Codebase accompanying the Summary of a Haystack paper.β72Updated 2 months ago
- awesome synthetic (text) datasetsβ242Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absoluteβ¦β48Updated 4 months ago
- Let's build better datasets, together!β205Updated this week
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ40Updated 8 months ago
- Notebooks for training universal 0-shot classifiers on many different tasksβ106Updated 7 months ago
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo rankerβ106Updated 3 weeks ago
- β75Updated 5 months ago
- β93Updated last month
- β105Updated 2 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β61Updated 2 weeks ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language Mβ¦β180Updated 3 weeks ago
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paperβ¦β96Updated 7 months ago
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"β116Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"β80Updated 2 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]β103Updated last month
- β73Updated 10 months ago
- This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.β47Updated 2 months ago
- Lightweight demos for finetuning LLMs. Powered by π€ transformers and open-source datasets.β64Updated last month
- π Reference-Free automatic summarization evaluation with potential hallucination detectionβ98Updated 10 months ago
- Set of scripts to finetune LLMsβ36Updated 7 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ236Updated 4 months ago
- β40Updated 2 weeks ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersβ122Updated 8 months ago
- Functional Benchmarks and the Reasoning Gapβ78Updated last month
- Simple examples using Argilla tools to build AIβ40Updated this week
- Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuningβ41Updated 11 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.β130Updated this week
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).β77Updated 8 months ago
- β87Updated 9 months ago