Score LLM pretraining data with classifiers
☆55Nov 2, 2023Updated 2 years ago
Alternatives and similar repositories for classified
Users that are interested in classified are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Generate textbook-quality synthetic LLM pretraining data☆508Oct 19, 2023Updated 2 years ago
- Convert all of libgen to high quality markdown☆255Dec 13, 2023Updated 2 years ago
- Just large language models. Hackable, with as little abstraction as possible. Done for my own purposes, feel free to rip.☆44Sep 6, 2023Updated 2 years ago
- This repository contains all code examples for my TensorFlow World talk about "Advanced model deployments with TensorFlow Serving"☆17Dec 8, 2022Updated 3 years ago
- A toolkit for interpreting and analyzing neural networks (vision)☆34Jul 28, 2020Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Targeted Data Generation with Large Language Models☆19Jun 25, 2024Updated last year
- ☆22Aug 27, 2023Updated 2 years ago
- A structured framework for defining, verifying and certifying AI systems.☆19Mar 11, 2025Updated last year
- ☆24May 19, 2024Updated 2 years ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆23Oct 1, 2024Updated last year
- ☆45Oct 13, 2023Updated 2 years ago
- Synthetic Hypertext and Homomorphic Catalogue☆15Dec 28, 2024Updated last year
- tiny_fnc_engine is a minimal python library that provides a flexible engine for calling functions extracted from a LLM.☆38Sep 11, 2024Updated last year
- An Educational Framework Based on PyTorch for Deep Learning Education and Exploration☆11Dec 24, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- GraphRag vs Embeddings☆16Jul 14, 2024Updated last year
- Lightweight open-source perplexity☆62May 6, 2024Updated 2 years ago
- Set of scripts to finetune LLMs☆38Mar 30, 2024Updated 2 years ago
- I use various Data Science and machine learning techniques to analyze customer data using STP framework. I preprocessed the data, perform…☆11Apr 26, 2020Updated 6 years ago
- ICLR 2025☆30May 21, 2025Updated last year
- a getting-started sample for Clojure and Solr☆11Aug 28, 2015Updated 10 years ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆269Jul 8, 2025Updated 11 months ago
- ☆30Jul 22, 2024Updated last year
- Python library for Evaluation☆17Mar 31, 2026Updated 2 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Causal Inference for Time Series Data (with CausalML Demo)☆14Jun 11, 2023Updated 2 years ago
- ☆12Dec 13, 2023Updated 2 years ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated 2 years ago
- ☆13Apr 5, 2026Updated 2 months ago
- This is the public repository of AAAI 2024 paper "Is a Large Language Model a Good Annotator for Event Extraction"☆10Feb 16, 2024Updated 2 years ago
- Generate a strong link to a hypercore seq that contains a root hash of the merkle tree at that time☆14May 27, 2020Updated 6 years ago
- Verbosity control for AI agents☆66May 23, 2024Updated 2 years ago
- A glowfic to epub converter.☆14Apr 11, 2026Updated last month
- Code for Transformed Distribution Matching (TDM) for Missing Value Imputation, ICML 2023☆14Aug 4, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Slack bot that indexes all messages sent in channels and can provide an interactive semantic search experience for users☆10Jan 1, 2023Updated 3 years ago
- Training GPTs to solve interaction nets☆18Aug 14, 2024Updated last year
- ☆86Jan 15, 2024Updated 2 years ago
- Super performant RAG pipeline for AI apps.☆17Mar 10, 2024Updated 2 years ago
- ☆11Aug 10, 2024Updated last year
- batched loras☆351Sep 6, 2023Updated 2 years ago
- Small handful of my dotfiles☆11Updated this week