A powerful, high-performance CLI tool for removing duplicate lines from text files with advanced comparison options and parallel processing capabilities.
☆10Apr 16, 2025Updated 10 months ago
Alternatives and similar repositories for DupeRemover
Users that are interested in DupeRemover are comparing it to the libraries listed below
Sorting:
- Continual Memorization of Factoids in Large Language Models☆12Nov 20, 2024Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆52Oct 19, 2024Updated last year
- [NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs☆141Apr 22, 2025Updated 10 months ago
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated 8 months ago
- A project to improve skills of large language models☆843Updated this week
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆642Mar 4, 2024Updated 2 years ago
- LongBench v2 and LongBench (ACL 25'&24')☆1,101Jan 15, 2025Updated last year
- A bibliography and survey of the papers surrounding o1☆1,213Nov 16, 2024Updated last year
- This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?☆1,462Nov 13, 2025Updated 3 months ago
- A reading list on LLM based Synthetic Data Generation 🔥☆1,520Jun 5, 2025Updated 9 months ago
- An Internet-Scale Database.☆1,907Jun 5, 2024Updated last year
- A framework for few-shot evaluation of language models.☆11,540Updated this week
- Development repository for the Triton language and compiler☆18,501Updated this week
- ZincSearch . A lightweight alternative to elasticsearch that requires minimal resources, written in Go.☆17,749Jan 23, 2026Updated last month
- Markdown for the component era☆19,281Feb 19, 2026Updated 2 weeks ago
- The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lak…☆20,788Updated this week
- Python scraper based on AI☆22,786Feb 24, 2026Updated last week
- lightweight, idiomatic and composable router for building Go HTTP services☆21,760Feb 19, 2026Updated 2 weeks ago
- A realtime distributed messaging platform☆25,883Jul 13, 2025Updated 7 months ago
- Code for the paper "Language Models are Unsupervised Multitask Learners"☆24,648Aug 14, 2024Updated last year
- Distributed Task Queue (development branch)☆28,152Feb 25, 2026Updated last week
- 🕵️♂️ All-in-one OSINT tool for analysing any website☆32,185Jan 31, 2026Updated last month
- DSPy: The framework for programming—not prompting—language models☆32,519Updated this week
- Scalable datastore for metrics, events, and real-time analytics☆31,325Updated this week
- The fastest HTTP/2 Go Web Framework. New, modern and easy to learn. Fast development with Code you control. Unbeatable cost-performance r…☆25,623Jan 15, 2026Updated last month
- CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placemen…☆31,960Updated this week
- Roadmap to becoming an Artificial Intelligence Expert in 2022☆30,767Sep 12, 2025Updated 5 months ago
- Fast, disk space efficient package manager☆34,175Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆41,516Updated this week
- Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!☆39,475Nov 27, 2022Updated 3 years ago
- GitHub’s official command line tool☆42,818Updated this week
- ClickHouse® is a real-time analytics database management system☆46,142Updated this week
- Distributed reliable key-value store for the most critical data of a distributed system☆51,590Updated this week
- Standard Go Project Layout☆55,462Dec 12, 2025Updated 2 months ago
- A curated list of awesome C++ (or C) frameworks, libraries, resources, and shiny things. Inspired by awesome-... stuff.☆69,981Updated this week
- Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one☆87,780Updated this week
- Tensors and Dynamic neural networks in Python with strong GPU acceleration☆97,870Updated this week
- A collection of various awesome lists for hackers, pentesters and security researchers☆107,451Jan 18, 2025Updated last year
- Curated coding interview preparation materials for busy software engineers☆137,947Jan 26, 2026Updated last month