Tooling for exact and MinHash deduplication of large-scale text datasets
☆81Mar 24, 2026Updated last month
Alternatives and similar repositories for duplodocus
Users that are interested in duplodocus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Data mapping framework for rust stuff☆51Mar 25, 2026Updated last month
- ☆72Apr 20, 2026Updated 2 weeks ago
- FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones☆66Jan 26, 2026Updated 3 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆58Oct 16, 2025Updated 6 months ago
- Official Implementation of wd1☆29Sep 25, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Fork of Flame repo for training of some new stuff in development☆19Apr 24, 2026Updated last week
- ☆21Jun 4, 2025Updated 11 months ago
- AJAX Spinners for your Ember.js app☆10Oct 19, 2018Updated 7 years ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- Download and preperation tool for free speech corpora.☆16Apr 28, 2019Updated 7 years ago
- LangGraph Typescript Agents Notebooks: email, human in the loop, memory☆33Apr 23, 2026Updated last week
- ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling☆130Mar 31, 2026Updated last month
- ☆43Aug 5, 2025Updated 9 months ago
- ☆28Sep 22, 2025Updated 7 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆23Nov 26, 2024Updated last year
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆39Dec 2, 2025Updated 5 months ago
- Revamped: Hugo+LoveIt☆10Apr 14, 2026Updated 2 weeks ago
- Runtime types for OCaml (beta version)☆26Apr 6, 2026Updated 3 weeks ago
- Serverless AI powered by WebAssembly☆19Feb 6, 2026Updated 2 months ago
- ☆70Apr 24, 2026Updated last week
- Using PyTorch autograd to compute Hessian of Perplexity for Large Language Models☆29Apr 17, 2025Updated last year
- [Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆16Apr 15, 2026Updated 2 weeks ago
- ☆102Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [NeurIPS 2025] Official Pytorch Implementation of "The Curse of Depth in Large Language Models" by Wenfang Sun, Xinyuan Song, Pengxiang L…☆70Mar 3, 2026Updated 2 months ago
- Cloud Native Distributed Nearest Neighbour Search☆15Jun 9, 2020Updated 5 years ago
- A queryable GeoJSON API using MongoDB & Flask☆12Apr 5, 2016Updated 10 years ago
- 2nd place solution for RecSys Challenge 2025 by Synerise☆18Jul 9, 2025Updated 9 months ago
- Website for hosting the Open Foundation Models Cheat Sheet.☆271May 7, 2025Updated 11 months ago
- A Wikipedia-based summarization dataset☆14Mar 27, 2023Updated 3 years ago
- Enable RNNLM lattice rescoring with Pytorch [kaldi]☆12Jun 5, 2020Updated 5 years ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆133Jun 24, 2025Updated 10 months ago
- A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training☆22Sep 7, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The evaluation framework for training-free sparse attention in LLMs☆122Jan 27, 2026Updated 3 months ago
- Data and tools for generating and inspecting OLMo pre-training data.☆1,492Nov 5, 2025Updated 5 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆91Sep 12, 2025Updated 7 months ago
- Package vecf32 provides common functions and methods for slices of float32☆13Jun 14, 2023Updated 2 years ago
- Repository for Skill Set Optimization☆14Jul 26, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Nov 27, 2023Updated 2 years ago
- nyc is so back☆21Jun 27, 2025Updated 10 months ago