Tooling for exact and MinHash deduplication of large-scale text datasets
☆84Mar 24, 2026Updated 2 months ago
Alternatives and similar repositories for duplodocus
Users that are interested in duplodocus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data mapping framework for rust stuff☆53Mar 25, 2026Updated 2 months ago
- ☆77Apr 20, 2026Updated last month
- ☆17Aug 5, 2025Updated 9 months ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆67Jan 26, 2026Updated 3 months ago
- decontamination☆33Mar 4, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official Implementation of wd1☆30Sep 25, 2025Updated 8 months ago
- A service to store and provide historical data for K8S clusters using the Yunikorn scheduler☆10Feb 13, 2025Updated last year
- An implementation using pytorch of the models presented in the Multi-View Data Generation Without View Supervision paper.☆13Sep 30, 2019Updated 6 years ago
- It's a cooler way to store simple linear models.☆26Jul 15, 2024Updated last year
- Fork of Flame repo for training of some new stuff in development☆19Apr 24, 2026Updated last month
- ☆59Dec 10, 2025Updated 5 months ago
- Collection of LLM completions for reasoning-gym task datasets☆31Jul 4, 2025Updated 10 months ago
- ☆21Jun 4, 2025Updated 11 months ago
- ☆56Mar 18, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Fluid Language Model Benchmarking☆29Sep 16, 2025Updated 8 months ago
- AJAX Spinners for your Ember.js app☆10Oct 19, 2018Updated 7 years ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆365May 6, 2026Updated 2 weeks ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Download and preperation tool for free speech corpora.☆16Apr 28, 2019Updated 7 years ago
- LangGraph Typescript Agents Notebooks: email, human in the loop, memory☆33May 3, 2026Updated 3 weeks ago
- Simple collared option smart contract for ETH/BTC☆19Oct 12, 2016Updated 9 years ago
- ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling☆135Mar 31, 2026Updated last month
- ☆83May 8, 2026Updated 2 weeks ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆23Nov 26, 2024Updated last year
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆39Dec 2, 2025Updated 5 months ago
- Library classes for the Twelf Proof System☆23Jun 16, 2020Updated 5 years ago
- ☆23Jun 25, 2021Updated 4 years ago
- Revamped: Hugo+LoveIt☆10May 14, 2026Updated last week
- EAFT(Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting) official repo☆101Jan 15, 2026Updated 4 months ago
- ☆23Oct 10, 2025Updated 7 months ago
- A comprehensive framework for benchmarking single and multi-agent systems across a wide range of tasks—evaluating performance, accuracy, …☆38Nov 11, 2025Updated 6 months ago
- Implementation of the Delta Language☆13Mar 18, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆86Oct 29, 2024Updated last year
- ☆27Mar 4, 2025Updated last year
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆29Apr 17, 2025Updated last year
- [Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆16Apr 15, 2026Updated last month
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆71Mar 3, 2026Updated 2 months ago
- ☆114Updated this week
- Cloud Native Distributed Nearest Neighbour Search☆15Jun 9, 2020Updated 5 years ago