Spawning-Inc / datadiligenceLinks
Respect generative AI opt-outs in your ML training pipeline.
☆39Updated 11 months ago
Alternatives and similar repositories for datadiligence
Users that are interested in datadiligence are comparing it to the libraries listed below
Sorting:
- Blueprint to Build Your Own Timeline Algorithm☆66Updated 3 months ago
- Gradio Client in Rust.☆28Updated 11 months ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Updated 2 years ago
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆32Updated 2 years ago
- 🐸Coqui Dialogue Audio Pack contains more than 2000 audio files of synthetic human voices over dialogue created specifically for video ga…☆42Updated 2 years ago
- GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗☆76Updated 10 months ago
- Lightweight tools for quick and easy LLM demo's☆28Updated last year
- ☆42Updated last year
- assign color hues to a collection of text fragments based on embeddings☆20Updated last year
- Converts JSON-Schema to GBNF grammar to use with llama.cpp☆55Updated last year
- image-to-text model for PDF.js☆47Updated 6 months ago
- RAG for any docs hosted on readthedocs☆40Updated last year
- A polite and user-friendly downloader for Common Crawl data☆57Updated last month
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆91Updated 2 years ago
- ☆16Updated 2 years ago
- Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.io☆37Updated 3 weeks ago
- ☆51Updated last year
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆64Updated last year
- 🦄 An NLP application just for the lols: built with Haystack to get an overview of what a user is posting about on Twitter☆45Updated last year
- ☆26Updated 9 months ago
- [WIP] A 🔥 interface for running code in the cloud☆85Updated 2 years ago
- ☆43Updated 2 weeks ago
- Efficiently computing & storing token n-grams from large corpora☆26Updated 11 months ago
- A very simple interactive demo to understand the common LLM samplers.☆36Updated last year
- ☆11Updated last year
- BlinkDL's RWKV-v4 running in the browser☆47Updated 2 years ago
- Turn any collection of files into a dataset☆45Updated 2 years ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- Drop in replacement for OpenAI, but with Open models.☆153Updated 2 years ago
- Browse, search, and visualize ONNX models.☆34Updated 4 months ago