Spawning-Inc / datadiligenceLinks
Respect generative AI opt-outs in your ML training pipeline.
☆39Updated last year
Alternatives and similar repositories for datadiligence
Users that are interested in datadiligence are comparing it to the libraries listed below
Sorting:
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆32Updated 2 years ago
- Lightweight tools for quick and easy LLM demo's☆28Updated last year
- ☆52Updated 2 years ago
- RAG for any docs hosted on readthedocs☆42Updated 2 years ago
- ☆42Updated last year
- Gradio Client in Rust.☆28Updated 2 months ago
- 🤗 Optimum ONNX: Export your model to ONNX and run inference with ONNX Runtime☆114Updated last week
- 🦄 An NLP application just for the lols: built with Haystack to get an overview of what a user is posting about on Twitter☆46Updated 2 years ago
- ☆50Updated 3 months ago
- A library for detecting problematic data segments in structured and unstructured data with few lines of code.☆64Updated 2 years ago
- llm sampler that only allows words that are in the bible☆43Updated last year
- Converts JSON-Schema to GBNF grammar to use with llama.cpp☆55Updated 2 years ago
- assign color hues to a collection of text fragments based on embeddings☆20Updated last year
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆91Updated 2 years ago
- BlinkDL's RWKV-v4 running in the browser☆48Updated 2 years ago
- ☆63Updated last year
- [WIP] A 🔥 interface for running code in the cloud☆86Updated 2 years ago
- Completion After Prompt Probability. Make your LLM make a choice☆82Updated last year
- Tracking instruction-tuned LLM openness. Paper: Liesenfeld, Andreas, Alianda Lopez, and Mark Dingemanse. 2023. “Opening up ChatGPT: Track…☆119Updated 11 months ago
- **ARCHIVED** Filesystem interface to 🤗 Hub☆59Updated 2 years ago
- GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗☆79Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆48Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated last year
- A Python module for retrieving script types of writing systems including alphabets, abjads, abugidas, syllabaries, logographs, featurals …☆15Updated last year
- A polite and user-friendly downloader for Common Crawl data☆67Updated 5 months ago
- Unofficial python bindings for the rust llm library. 🐍❤️🦀☆76Updated 2 years ago
- Benchmark scripts for comparing different tokenizers and sentence segmenters of German☆12Updated 2 years ago
- Sort a folder of images according to their similarity with provided text in your browser (uses a browser-ported version of OpenAI's CLIP …☆193Updated last year
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆50Updated 2 years ago
- Evalica, your favourite evaluation toolkit☆62Updated this week