This is the reproduction repository for my π€ Hugging Face blog post on synthetic data
β68Feb 18, 2024Updated 2 years ago
Alternatives and similar repositories for synthetic-data-blog
Users that are interested in synthetic-data-blog are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β33Updated this week
- β14Jan 25, 2026Updated 4 months ago
- Format and Complete Few-Shot LLM Promptsβ21Apr 22, 2026Updated last month
- Using modal.com to process FineWeb-edu dataβ20Apr 11, 2026Updated last month
- ReBase: Training Task Experts through Retrieval Based Distillationβ29Feb 5, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Knowledge Graph Generator appβ34Apr 18, 2024Updated 2 years ago
- My Gen AI researchβ11Jun 3, 2024Updated last year
- Chrome Extension for exploring Hugging Face datasets πβ48Sep 18, 2024Updated last year
- Textual statistics for quantedaβ18Jul 9, 2025Updated 10 months ago
- awesome synthetic (text) datasetsβ332Jan 8, 2026Updated 4 months ago
- Literature π and datasets π on automatic populism detectionβ19Mar 15, 2025Updated last year
- A library for working with prompt templates locally or on the Hugging Face Hub.β57Mar 5, 2025Updated last year
- Working files for the Bibframe2Schema.org Working Groupβ11Oct 25, 2023Updated 2 years ago
- β16Nov 11, 2025Updated 6 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Workshop and collection of howtos for dealing with git / githubβ16Feb 17, 2021Updated 5 years ago
- [Early Release] Quarto Extension for Automatic Language Tabsβ24Nov 26, 2024Updated last year
- various experiments for scaling inference time compute with small reasoning modelsβ17Jan 16, 2025Updated last year
- Use Hermes-2-Pro-Mistral-7B function calling with your OpenAI API compatible code.β18May 7, 2024Updated 2 years ago
- This is my Official and newest website portfolio with the source code if you want to try my designβ18Dec 13, 2023Updated 2 years ago
- distilled Self-Critique refines the outputs of a LLM with only synthetic dataβ11Apr 11, 2024Updated 2 years ago
- Browser extension to simulate browsing behaviour in search engines.β33May 22, 2026Updated last week
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts)β3,210May 13, 2026Updated 2 weeks ago
- A tiny server to run local inference on MLX model in the style of OpenAIβ13Jan 31, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Serving hugging face guidance behind a serverβ13Jun 14, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2β20Oct 23, 2023Updated 2 years ago
- Minimal example scripts of the Hugging Face Trainer, focused on staying under 150 linesβ197May 6, 2024Updated 2 years ago
- β90Dec 31, 2023Updated 2 years ago
- Janus is an opensource IA for Star Citizenβ11Dec 23, 2023Updated 2 years ago
- β28Aug 1, 2024Updated last year
- Experiment to slice, dice, and clean up spreadsheetsβ15May 3, 2024Updated 2 years ago
- Guide to adding domain knowledge to LLMsβ17Jul 10, 2023Updated 2 years ago
- benchmarks for LLM tokenizersβ18Mar 25, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- π π Auto check for new apartments in Hamburg from various real estate providesβ16Apr 15, 2026Updated last month
- An example of multilingual machine translation using a pretrained version of mt5 from Hugging Face.β42Apr 17, 2021Updated 5 years ago
- β20Jan 27, 2024Updated 2 years ago
- Automatically exported from code.google.com/p/transducersaurusβ11Apr 1, 2015Updated 11 years ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifiβ¦β3,229May 18, 2026Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMsβ11Jul 22, 2023Updated 2 years ago
- Interface to the Comparative Legislators Databaseβ97Updated this week