a pipeline for using api calls to agnostically convert unstructured data into structured training data
☆32Sep 22, 2024Updated last year
Alternatives and similar repositories for datagen
Users that are interested in datagen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆22Aug 27, 2023Updated 2 years ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆83Sep 10, 2023Updated 2 years ago
- ☆45Oct 13, 2023Updated 2 years ago
- ☆11Aug 26, 2024Updated last year
- Starter template for python projects☆18Feb 15, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Website for Applied-LLMs work☆29May 5, 2026Updated last month
- AgentOS is a lightweight, single-file implementation that provides a robust foundation for building autonomous AI agents. It implements t…☆25Jul 11, 2025Updated 11 months ago
- Masked Structural Growth for 2x Faster Language Model Pre-training☆25Apr 28, 2024Updated 2 years ago
- quick playground to animate pippin☆16Nov 11, 2024Updated last year
- Bridging Large Language Models with Scala 3 Functions☆11Aug 31, 2024Updated last year
- Calling LLM APIs on a Raspberry Pi for lulz☆24Apr 17, 2023Updated 3 years ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆93Feb 27, 2024Updated 2 years ago
- The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).☆39Feb 24, 2026Updated 4 months ago
- direct preference optimization with only 1 model copy :)☆14Oct 2, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Simple Implementation of TinyGPTV in super simple Zeta lego blocks☆16Nov 11, 2024Updated last year
- Official repository for the paper "Exploring the Promise and Limits of Real-Time Recurrent Learning" (ICLR 2024)☆13Jun 11, 2025Updated last year
- ☆63Sep 23, 2024Updated last year
- An Educational Framework Based on PyTorch for Deep Learning Education and Exploration☆11Dec 24, 2023Updated 2 years ago
- Image Diffusion block merging technique applied to transformers based Language Models.☆56May 8, 2023Updated 3 years ago
- ☆24Feb 17, 2026Updated 4 months ago
- Nano Bots for Obsidian: small, AI-powered bots that can be easily shared as a single file, designed to support multiple providers such as…☆15Jan 13, 2024Updated 2 years ago
- AllThePatents tooling☆11Mar 23, 2024Updated 2 years ago
- ☆21Oct 6, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Latent Diffusion Language Models☆71Sep 20, 2023Updated 2 years ago
- ☆19Dec 31, 2025Updated 5 months ago
- Images of example pages from Transkribus model training sets to make it easier to find a match.☆16Jan 25, 2022Updated 4 years ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆104Jan 15, 2024Updated 2 years ago
- QLoRA with Enhanced Multi GPU Support☆38Aug 8, 2023Updated 2 years ago
- Full finetuning of large language models without large memory requirements☆93Sep 22, 2025Updated 9 months ago
- Tiktok is an advanced multimedia recommender system that fuses the generative modality-aware collaborative self-augmentation and contrast…☆14Aug 18, 2023Updated 2 years ago
- Generic build server☆65May 25, 2014Updated 12 years ago
- ☆30Sep 10, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A framework for few-shot evaluation of autoregressive language models.☆16Aug 23, 2023Updated 2 years ago
- Synthetic Data Generation using LLM via Argilla, Distilabel, ChatGPT, etc.☆31May 29, 2024Updated 2 years ago
- A proof of concept library for generating and running machine learning model tests☆13Sep 27, 2020Updated 5 years ago
- ☆18Apr 3, 2023Updated 3 years ago
- Minimal Implimentation of VCRec (2024) for collapse provention.☆18Jan 28, 2025Updated last year
- A repository of projects and datasets under active development by Alignment Lab AI☆22Dec 22, 2023Updated 2 years ago
- ☆20Oct 24, 2022Updated 3 years ago