Synthetic Text Dataset Generation for LLM projects
☆56Mar 10, 2026Updated last week
Alternatives and similar repositories for datafast
Users that are interested in datafast are comparing it to the libraries listed below
Sorting:
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆37Oct 16, 2025Updated 5 months ago
- ☆27Feb 11, 2026Updated last month
- SynthGenAI - Package for Generating Synthetic Datasets using LLMs.☆54Nov 24, 2025Updated 3 months ago
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆64Feb 6, 2025Updated last year
- synthetic data for ml☆25Jan 30, 2025Updated last year
- ☆23Jun 5, 2025Updated 9 months ago
- A GitHub App built with Probot that marks/censors Issues and Pull Requests containing offensive content.☆10Dec 16, 2023Updated 2 years ago
- Official website for the TRON (Token Reduced Object Notation) format☆36Nov 29, 2025Updated 3 months ago
- Centralize and streamline ML/AI lifecycle observability and compliance processes.☆12Feb 12, 2025Updated last year
- A CLI for generating synthetic data☆43May 14, 2025Updated 10 months ago
- A curated list of materials on AI guardrails☆47Jun 3, 2025Updated 9 months ago
- Extract Molecular SMILES embeddings from language models pre-trained with various objectives architectures.☆18Nov 9, 2023Updated 2 years ago
- ☆162Dec 2, 2024Updated last year
- A Python library for generating and loading synthetic and real-world datasets tailored for graph-based applications.☆37Aug 26, 2025Updated 6 months ago
- NeuroBLAST v3 architecture code☆36Jan 6, 2026Updated 2 months ago
- Demo of knowledge graph creation and Graph RAG with BAML and Kuzu☆73Sep 17, 2025Updated 6 months ago
- Curriculum training of instruction-following LLMs with Unsloth☆14Dec 15, 2025Updated 3 months ago
- The OS AI engineering and monitoring agent. 🦸♀️ Oversight and compliance copilot for trustworthy AI.☆45Jul 6, 2025Updated 8 months ago
- Next-generation Punkt sentence boundary detection with zero dependencies☆29Nov 18, 2025Updated 4 months ago
- ☆12Mar 4, 2025Updated last year
- Plug-and-play document AI with zero-shot models.☆125Feb 16, 2026Updated last month
- ☆10Nov 12, 2024Updated last year
- ☆17Feb 18, 2026Updated last month
- A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…☆12Sep 16, 2024Updated last year
- [KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models☆11Apr 9, 2024Updated last year
- ☆39Jan 30, 2026Updated last month
- This sample code demonstrates how to build an Amazon SageMaker environment for HPO using Optuna (an open source hyperparameter tuning fra…☆11May 21, 2024Updated last year
- EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other la…☆95Nov 28, 2025Updated 3 months ago
- ☆22Nov 19, 2017Updated 8 years ago
- Search for a DOI (Digital Object Identifier) in Sci-Hub immediately after selecting it☆17Jun 1, 2019Updated 6 years ago
- Build datasets using natural language☆570Sep 19, 2025Updated 6 months ago
- This is the code repository for the AI project template. The idea of this template is to have a code framework prepared for any AI/ML/MLO…☆41Jan 26, 2026Updated last month
- Dataset Viber is your chill repo for data collection, annotation and vibe checks.☆47Sep 5, 2024Updated last year
- ☆10Dec 3, 2024Updated last year
- ☆15May 12, 2025Updated 10 months ago
- 🚀 [ICLR '25] RocketEval: Efficient Automated LLM Evaluation via Grading Checklist☆15Aug 21, 2025Updated 6 months ago
- AdFit Web SDK for Publisher☆14Jul 6, 2023Updated 2 years ago
- ☆12Jul 8, 2021Updated 4 years ago
- Ludic – an LLM-RL library for the era of experience☆61Jan 9, 2026Updated 2 months ago