Tooling for exact and MinHash deduplication of large-scale text datasets
☆85Mar 24, 2026Updated 2 months ago
Alternatives and similar repositories for duplodocus
Users that are interested in duplodocus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data mapping framework for rust stuff☆54Mar 25, 2026Updated 2 months ago
- ☆17Aug 5, 2025Updated 10 months ago
- PyTorch building blocks for the OLMo ecosystem☆1,289Updated this week
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆68Jan 26, 2026Updated 4 months ago
- decontamination☆33Mar 4, 2026Updated 3 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official Implementation of wd1☆30Sep 25, 2025Updated 8 months ago
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- ☆21Jun 4, 2025Updated last year
- ☆56Mar 18, 2026Updated 3 months ago
- Fluid Language Model Benchmarking☆30Sep 16, 2025Updated 9 months ago
- AJAX Spinners for your Ember.js app☆10Oct 19, 2018Updated 7 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆32Nov 12, 2024Updated last year
- Download and preperation tool for free speech corpora.☆16Apr 28, 2019Updated 7 years ago
- WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation☆154Jun 11, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Building the cognitive-core to solve ARC-AGI-2☆27Feb 2, 2025Updated last year
- ☆88May 8, 2026Updated last month
- ☆44Aug 5, 2025Updated 10 months ago
- ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling☆152Mar 31, 2026Updated 2 months ago
- Single-pass Adaptive Image Tokenization for Minimum Program Search | What's the Kolmogorov Complexity of an Image?☆43Jul 26, 2025Updated 10 months ago
- ☆32Sep 22, 2025Updated 8 months ago
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆40Dec 2, 2025Updated 6 months ago
- ☆55Aug 5, 2025Updated 10 months ago
- (Pytorch and Tensorflow) Implementation of Weighted Contrastive Loss (Deep Metric Learning by Online Soft Mining and Class-Aware Attentio…☆21Oct 21, 2019Updated 6 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Revamped: Hugo+LoveIt☆10May 14, 2026Updated last month
- Runtime types for OCaml (beta version)☆27May 18, 2026Updated last month
- ☆24Oct 10, 2025Updated 8 months ago
- ☆27Dec 8, 2025Updated 6 months ago
- Pytorch implementation of Fast Deep Matting for Portrait Animation on Mobile Phone paper☆25Apr 26, 2020Updated 6 years ago
- Latest and fastest EigenPro that scales to billions of examples☆10Apr 18, 2026Updated 2 months ago
- CVPR 2022 Continual Learning in Computer Vision Workshop Challenge☆27Dec 15, 2022Updated 3 years ago
- Extrapolating RLVR to General Domains without Verifiers☆203Aug 12, 2025Updated 10 months ago
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆29Apr 17, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆16Apr 15, 2026Updated 2 months ago
- A simple n-gram language model.☆12Sep 11, 2018Updated 7 years ago
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆71Mar 3, 2026Updated 3 months ago
- Cloud Native Distributed Nearest Neighbour Search☆15Jun 9, 2020Updated 6 years ago
- ☆14May 16, 2024Updated 2 years ago
- ☆130Updated this week
- CycleQD is a framework for parameter space model merging.☆49Feb 1, 2025Updated last year