Synthetic Text Dataset Generation for LLM projects
☆56Feb 19, 2026Updated last week
Alternatives and similar repositories for datafast
Users that are interested in datafast are comparing it to the libraries listed below
Sorting:
- ☆14Mar 9, 2023Updated 2 years ago
- ☆26Feb 11, 2026Updated 2 weeks ago
- Curriculum training of instruction-following LLMs with Unsloth☆14Dec 15, 2025Updated 2 months ago
- SynthTextEval: A Toolkit for Generating and Evaluating Synthetic Data For High-Stakes Domains (EMNLP 2025 System Demonstration)☆26Nov 3, 2025Updated 3 months ago
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆18Nov 9, 2023Updated 2 years ago
- SynthGenAI - Package for Generating Synthetic Datasets using LLMs.☆54Nov 24, 2025Updated 3 months ago
- A curated list of materials on AI guardrails☆45Jun 3, 2025Updated 8 months ago
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆36Aug 27, 2025Updated 6 months ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆63Feb 6, 2025Updated last year
- Centralize and streamline ML/AI lifecycle observability and compliance processes.☆12Feb 12, 2025Updated last year
- The OS AI engineering and monitoring agent. 🦸♀️ Oversight and compliance copilot for trustworthy AI.☆46Jul 6, 2025Updated 7 months ago
- Large language models for document ranking.☆71Jan 13, 2026Updated last month
- synthetic data for ml☆25Jan 30, 2025Updated last year
- A Chemistry Toolkit that turns your AI assistant into a Chemistry coscientist..☆53Jun 9, 2025Updated 8 months ago
- ☆10Nov 12, 2024Updated last year
- ☆12Apr 23, 2018Updated 7 years ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆34Aug 24, 2024Updated last year
- Evals that meet you where you are. For AI that's grounded.☆52Feb 6, 2026Updated 3 weeks ago
- ☆10Apr 6, 2023Updated 2 years ago
- ☆15Jan 10, 2025Updated last year
- User-friendly viewer for Parquet files☆10Jan 10, 2026Updated last month
- pix2pix and Cycle GAN architectures for image style transfer☆13May 27, 2021Updated 4 years ago
- Repository of IPBench☆19Jan 4, 2026Updated last month
- fine-tuning tutorial☆18Feb 20, 2026Updated last week
- Reinforcement learning modular with pytorch☆11Jan 18, 2021Updated 5 years ago
- ☆16Jan 16, 2023Updated 3 years ago
- DOS Program Development☆13Nov 9, 2022Updated 3 years ago
- Plug-and-play document AI with zero-shot models.☆124Feb 16, 2026Updated last week
- A CLI for generating synthetic data☆43May 14, 2025Updated 9 months ago
- MEXMA: Token-level objectives improve sentence representations☆43Jan 6, 2025Updated last year
- GBM implementation on Legate☆14Jan 28, 2026Updated last month
- Demonstrate using MCP with Pydantic AI framework☆14Mar 14, 2025Updated 11 months ago
- ☆12Sep 27, 2024Updated last year
- Pantheon's WordPress mu-plugin for all WordPress-based upstreams.☆10Feb 20, 2026Updated last week
- Collaborative Synchronized Corpus Annotation Tool☆11Dec 31, 2018Updated 7 years ago
- Materiais do Curso de Introdução à Pesquisa Jurimétrica☆12Oct 25, 2023Updated 2 years ago
- ☆11Dec 6, 2023Updated 2 years ago
- Redis distributed lock implementation for Python based on Pub/Sub messaging☆11Feb 14, 2026Updated 2 weeks ago
- ☆11Jul 17, 2023Updated 2 years ago