Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
Alternatives and similar repositories for pile_dedupe
Users that are interested in pile_dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆44Nov 17, 2024Updated last year
- ☆18Jul 31, 2025Updated 9 months ago
- Using queues, tqdm-multiprocess supports multiple worker processes, each with multiple tqdm progress bars, displaying them cleanly throug…☆43Jan 6, 2021Updated 5 years ago
- See the issue board for the current status of active and prospective projects!☆65Feb 12, 2022Updated 4 years ago
- An awesome list of Causality and Machine Learning related papers, books and other resources.☆12Nov 13, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Some example codes for drawing figures in research paper☆35Mar 3, 2022Updated 4 years ago
- An extensive and commented list of resources on Learned Sparse Retrieval.☆57Apr 27, 2026Updated 3 weeks ago
- Implementation of VQ-VAE with a GPT-style sampler in the JAX and Haiku ecosystem.☆11Nov 23, 2023Updated 2 years ago
- ☆27Mar 21, 2024Updated 2 years ago
- An implementation of DreamerV2 written in JAX, with support for running multiple random seeds of an experiment on a single GPU.☆18Jan 16, 2023Updated 3 years ago
- ☆64Apr 9, 2024Updated 2 years ago
- ☆11Nov 23, 2024Updated last year
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated 2 years ago
- A conversational LoRA for OPT 2.7b☆10Apr 28, 2023Updated 3 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆45Updated this week
- Language models scale reliably with over-training and on downstream tasks☆101Apr 2, 2024Updated 2 years ago
- ☆19Oct 27, 2025Updated 6 months ago
- From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking☆14Oct 25, 2022Updated 3 years ago
- The goal of this project is to develop a program for planetary soft landings using lossless convexification of non convex control bounds.☆12Mar 25, 2022Updated 4 years ago
- 接地气的大模型工程,争取成为一本大模型实战百科全书☆16Oct 16, 2023Updated 2 years ago
- 逻辑回归和单层softmax的解析解☆12Jul 29, 2021Updated 4 years ago
- ☆12Mar 22, 2024Updated 2 years ago
- Code for our SIGIR 2022 paper. CONVINSE is a framework for conversational question answering (ConvQA) over heterogeneous information sour…☆12Oct 4, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A compiler written in Java to compile a subset of instructions called MiniJava.☆10Apr 20, 2015Updated 11 years ago
- Compress conventional Vision-Language Pre-training data☆52Sep 22, 2023Updated 2 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Code for our SIGIR 2023 paper. EXPLAIGNN provides a pipeline for conversational question answering (ConvQA) over heterogeneous sources, a…☆12Jul 15, 2023Updated 2 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- ☆23Jul 27, 2023Updated 2 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆30May 23, 2024Updated 2 years ago
- An unsorted collection of little tools and scripts I've made that don't fit anywhere else☆19Jul 15, 2022Updated 3 years ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆82May 2, 2026Updated 3 weeks ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆72Jun 19, 2024Updated last year
- [ACL'21] Dialogue Response Selection with Hierarchical Curriculum Learning☆21Nov 15, 2022Updated 3 years ago
- RAG methods, benchmarks, and toolkits☆19Nov 28, 2024Updated last year
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- Code for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"☆33Mar 31, 2025Updated last year
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research☆22Dec 11, 2024Updated last year