Spawning-Inc / datadiligenceLinks

Respect generative AI opt-outs in your ML training pipeline.

☆39

Alternatives and similar repositories for datadiligence

Users that are interested in datadiligence are comparing it to the libraries listed below

Sorting:

mozilla-ai / byota
Blueprint to Build Your Own Timeline Algorithm
☆62Updated last month
modal-labs / boombot
☆42Updated 11 months ago
TheoCoombes / crawlingathome
A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
☆32Updated 2 years ago
sanjibnarzary / awesome-llm
Curated list of open source and openly accessible large language models
☆26Updated 2 years ago
allenai / adapt-demos
Lightweight tools for quick and easy LLM demo's
☆28Updated 10 months ago
rom1504 / any2dataset
Turn any collection of files into a dataset
☆45Updated 2 years ago
adrienbrault / json-schema-to-gbnf
Converts JSON-Schema to GBNF grammar to use with llama.cpp
☆55Updated last year
simonw / llm-clip
Generate embeddings for images and text using CLIP with LLM
☆70Updated last year
Renumics / sliceguard
A library for detecting problematic data segments in structured and unstructured data with few lines of code.
☆64Updated last year
kyutai-labs / dactory
☆41Updated 2 months ago
CERC-AAI / Robin
☆63Updated 10 months ago
rbourgeat / refacto
Refactor your code with local LLM in VSCode
☆13Updated last year
JacobLinCool / gradio-rs
Gradio Client in Rust.
☆28Updated 9 months ago
TextGeneratorio / text-generator.io
Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.io
☆35Updated last week
Hellisotherpeople / llm_steer-oobabooga
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆43Updated last year
huggingface / leaderboards
☆18Updated 3 months ago
opening-up-chatgpt / opening-up-chatgpt.github.io
Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…
☆118Updated 4 months ago
the-crypt-keeper / ggml-downloader
Simple, Fast, Parallel Huggingface GGML model downloader written in python
☆24Updated last year
Sourasky-DHLAB / Whisper
Google Colab Notebooks for Transcription with Whisper
☆24Updated 3 months ago
lucidrains / TPDNE
Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.
☆90Updated last year
huggingface / fuego
[WIP] A 🔥 interface for running code in the cloud
☆85Updated 2 years ago
LAION-AI / laion-dreams
Aim for the moon. If you miss, you may hit a star.
☆165Updated 2 years ago
SonicCodes / subcloning
implementation of https://arxiv.org/pdf/2312.09299
☆21Updated last year
kuprel / minbpe-pytorch
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA
☆38Updated last year
multimodalart / mindseye
MindsEye beta - ai art pilot
☆81Updated 3 years ago
Nicolas-BZRD / EuroBERT
Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…
☆66Updated 3 weeks ago
simonw / llm-cluster
LLM plugin for clustering embeddings
☆77Updated last year
KorAP / Tokenizer-Evaluation
Benchmark scripts for comparing different tokenizers and sentence segmenters of German
☆12Updated 2 years ago
DSMejantel / Ecole_inclusive
☆14Updated 8 months ago
nateraw / huggingface-sync-action
GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗
☆76Updated 8 months ago