Spawning-Inc / datadiligenceLinks
Respect generative AI opt-outs in your ML training pipeline.
☆39Updated 9 months ago
Alternatives and similar repositories for datadiligence
Users that are interested in datadiligence are comparing it to the libraries listed below
Sorting:
- Blueprint to Build Your Own Timeline Algorithm☆62Updated last month
- ☆42Updated 11 months ago
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆32Updated 2 years ago
- Curated list of open source and openly accessible large language models☆26Updated 2 years ago
- Lightweight tools for quick and easy LLM demo's☆28Updated 10 months ago
- Turn any collection of files into a dataset☆45Updated 2 years ago
- Converts JSON-Schema to GBNF grammar to use with llama.cpp☆55Updated last year
- Generate embeddings for images and text using CLIP with LLM☆70Updated last year
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆64Updated last year
- ☆41Updated 2 months ago
- ☆63Updated 10 months ago
- Refactor your code with local LLM in VSCode☆13Updated last year
- Gradio Client in Rust.☆28Updated 9 months ago
- Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.io☆35Updated last week
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- ☆18Updated 3 months ago
- Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…☆118Updated 4 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Google Colab Notebooks for Transcription with Whisper☆24Updated 3 months ago
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆90Updated last year
- [WIP] A 🔥 interface for running code in the cloud☆85Updated 2 years ago
- Aim for the moon. If you miss, you may hit a star.☆165Updated 2 years ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated last year
- Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization, with PyTorch/CUDA☆38Updated last year
- MindsEye beta - ai art pilot☆81Updated 3 years ago
- Optimus is a flexible and scalable framework built to train language models efficiently across diverse hardware configurations, including…☆66Updated 3 weeks ago
- LLM plugin for clustering embeddings☆77Updated last year
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Updated 2 years ago
- ☆14Updated 8 months ago
- GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗☆76Updated 8 months ago