Tooling for exact and MinHash deduplication of large-scale text datasets
☆72Feb 19, 2026Updated last week
Alternatives and similar repositories for duplodocus
Users that are interested in duplodocus are comparing it to the libraries listed below
Sorting:
- Data mapping framework for rust stuff☆46Feb 26, 2026Updated last week
- ☆48Jan 20, 2026Updated last month
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆64Jan 26, 2026Updated last month
- (Pytorch and Tensorflow) Implementation of Weighted Contrastive Loss (Deep Metric Learning by Online Soft Mining and Class-Aware Attentio…☆21Oct 21, 2019Updated 6 years ago
- PyTorch building blocks for the OLMo ecosystem☆839Updated this week
- ☆37Sep 21, 2025Updated 5 months ago
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆37Dec 2, 2025Updated 3 months ago
- Modifications to initial.pak for general improvements☆15Jan 29, 2026Updated last month
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆84Oct 29, 2024Updated last year
- Creates CMM script that can directly executed on Kaggle from easy merge script☆14Jan 12, 2026Updated last month
- E2E-SincNet: Toward fully end-to-end speech recognition☆30Feb 1, 2020Updated 6 years ago
- The OSC BSU CSI Driver is a CSI driver for Kubernetes allowing the use of Outscale Block Storage Units (BSU) volumes☆10Updated this week
- Koel Labs innovates open-source speech research, inclusive speech technologies, and real-time pronunciation feedback for language learner…☆18Feb 25, 2026Updated last week
- rabitq rust implementation☆10Feb 4, 2026Updated last month
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- A minimalistic deployment software focused on simplicity and clarity.☆11Feb 12, 2022Updated 4 years ago
- Some kind of TidalCycles implementation for SuperCollider☆14May 29, 2020Updated 5 years ago
- 어린이를 위한 동화 제작 서비스, My AI Fairy-Tale☆11Apr 7, 2023Updated 2 years ago
- implementation of Cyclic Boosting machine learning algorithms☆95Sep 2, 2024Updated last year
- Losses and decoders for end-to-end ASR and OCR☆34Oct 30, 2020Updated 5 years ago
- simplify the prediction process for a finetuned bert model☆11Jun 19, 2019Updated 6 years ago
- ⚗️ Aeromancy: A framework for performing reproducible AI and ML☆11Jun 5, 2025Updated 9 months ago
- A rust operating system for the ARM V7-A running on a beaglebone black☆12Mar 11, 2021Updated 4 years ago
- Get aid from local LLMs right in your PowerShell☆15May 2, 2025Updated 10 months ago
- DARPA Cyber Grand Challenge Linux source code☆17Jul 9, 2015Updated 10 years ago
- G-EDM is a wire and sinker EDM machine for the DIY community with focus on a mostly 3d printed concept☆22Nov 9, 2025Updated 3 months ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- BERT score for text generation☆12Jan 15, 2025Updated last year
- OpenCV Sample Projects in Rust☆12Nov 27, 2021Updated 4 years ago
- Utility commands for Maestro operating system☆14Oct 30, 2025Updated 4 months ago
- this is a work about UpliftRec☆10Dec 10, 2024Updated last year
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago
- Tool to apply Legal Matter Specification Standard (LMSS) to documents☆12Aug 15, 2024Updated last year
- Control the DOM from Python using Websockets☆12Mar 5, 2018Updated 7 years ago
- Unsupervised Word Discovery☆10Jul 26, 2019Updated 6 years ago
- ☆16Feb 18, 2024Updated 2 years ago
- An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)☆10May 31, 2024Updated last year
- A Starlette example for deployment in fastai2☆11Dec 18, 2020Updated 5 years ago
- Seminar: intro to deep learning with tensorflow☆13Jun 27, 2017Updated 8 years ago