Spawning-Inc / datadiligenceLinks
Respect generative AI opt-outs in your ML training pipeline.
☆39Updated last year
Alternatives and similar repositories for datadiligence
Users that are interested in datadiligence are comparing it to the libraries listed below
Sorting:
- ☆42Updated last year
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆31Updated 2 years ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated last year
- 🦄 An NLP application just for the lols: built with Haystack to get an overview of what a user is posting about on Twitter☆46Updated last year
- Lightweight tools for quick and easy LLM demo's☆28Updated last year
- Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…☆119Updated 8 months ago
- **ARCHIVED** Filesystem interface to 🤗 Hub☆58Updated 2 years ago
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆64Updated last year
- ☆61Updated 2 years ago
- Gradio Client in Rust.☆28Updated last year
- ☆51Updated 2 years ago
- assign color hues to a collection of text fragments based on embeddings☆20Updated last year
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- Pretraining data reconstruction scripts for Apertus☆103Updated 3 weeks ago
- Generate embeddings for images and text using CLIP with LLM☆74Updated last year
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆49Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- ☆43Updated last month
- ☆26Updated 11 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- 🐸Coqui Dialogue Audio Pack contains more than 2000 audio files of synthetic human voices over dialogue created specifically for video ga…☆42Updated 2 years ago
- Training code for Sparse Autoencoders on Embedding models☆38Updated 8 months ago
- [WIP] A 🔥 interface for running code in the cloud☆86Updated 2 years ago
- Code for training & inference with FLAN family of models☆17Updated 2 years ago
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…☆67Updated 4 months ago
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗☆78Updated last year
- Drop in replacement for OpenAI, but with Open models.☆153Updated 2 years ago
- llm sampler that only allows words that are in the bible☆42Updated 11 months ago
- ☆171Updated 9 months ago