Spawning-Inc / datadiligenceLinks
Respect generative AI opt-outs in your ML training pipeline.
β39Updated 8 months ago
Alternatives and similar repositories for datadiligence
Users that are interested in datadiligence are comparing it to the libraries listed below
Sorting:
- Lightweight tools for quick and easy LLM demo'sβ28Updated 9 months ago
- Chrome Extension for exploring Hugging Face datasets πβ50Updated 9 months ago
- assign color hues to a collection of text fragments based on embeddingsβ20Updated last year
- Data and information related to the Books3 dataset included as part of The Pile, and used to train Meta's LLaMA among othersβ31Updated last month
- Tool to apply Legal Matter Specification Standard (LMSS) to documentsβ13Updated 10 months ago
- ANE accelerated embedding models!β18Updated 6 months ago
- A clone of OpenAI's Tokenizer page for HuggingFace Modelsβ45Updated last year
- Run Vision LLMs, TTS and STT APIs. Website and API for https://text-generator.ioβ35Updated 2 weeks ago
- β67Updated last year
- Gradio Client in Rust.β28Updated 8 months ago
- llm sampler that only allows words that are in the bibleβ27Updated 6 months ago
- Paste Word, get Markdownβ17Updated 11 months ago
- β22Updated last year
- β38Updated last month
- Next-generation Punkt sentence boundary detection with zero dependenciesβ17Updated 2 months ago
- NLP with Rust for Python π¦πβ62Updated last month
- Tokun to can tokensβ17Updated last week
- Converts JSON-Schema to GBNF grammar to use with llama.cppβ55Updated last year
- Benchmark scripts for comparing different tokenizers and sentence segmenters of Germanβ11Updated 2 years ago
- Access different AI models in a one placeβ22Updated last year
- Nexusflow function call, tool use, and agent benchmarks.β20Updated 6 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.β32Updated 4 months ago
- Blueprint to Build Your Own Timeline Algorithmβ58Updated 3 weeks ago
- [Added T5 support to TRLX] A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)β47Updated 2 years ago
- Efficiently computing & storing token n-grams from large corporaβ24Updated 8 months ago
- Lightweight package that tracks and summarizes code changes using LLMs (Large Language Models)β34Updated 4 months ago
- β42Updated 10 months ago
- utilities for loading and running text embeddings with onnxβ44Updated 10 months ago
- A client library for LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.β32Updated 2 years ago
- About Interactive web viewer for exploring large neural networksβpowers the graph visualization of Talariaβ62Updated 8 months ago