Pile Deduplication Code
☆18May 15, 2023Updated 3 years ago
Alternatives and similar repositories for pile_dedupe
Users that are interested in pile_dedupe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆44Nov 17, 2024Updated last year
- ☆27Apr 1, 2026Updated 3 months ago
- Efficiently computing & storing token n-grams from large corpora☆27Jun 15, 2026Updated 2 weeks ago
- Awesome Reinforcement Learning from Human Feedback, the secret behind ChatGPT XD☆23Dec 13, 2022Updated 3 years ago
- ☆19Jul 31, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆79Dec 7, 2023Updated 2 years ago
- Some example codes for drawing figures in research paper☆36Mar 3, 2022Updated 4 years ago
- An extensive and commented list of resources on Learned Sparse Retrieval.☆61Jun 12, 2026Updated 3 weeks ago
- ☆28Mar 21, 2024Updated 2 years ago
- ☆64Apr 9, 2024Updated 2 years ago
- ☆11Nov 23, 2024Updated last year
- Hugging Face and Pyserini interoperability☆20May 18, 2023Updated 3 years ago
- Unit Scaling demo and experimentation code☆16Mar 12, 2024Updated 2 years ago
- A conversational LoRA for OPT 2.7b☆10Apr 28, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality☆49May 19, 2026Updated last month
- Advanced Reasoning Benchmark Dataset for LLMs☆48Nov 19, 2023Updated 2 years ago
- Source code of the paper "V-Droid: Advancing Mobile GUI Agent Through Generative Verifiers"☆33Feb 2, 2026Updated 5 months ago
- Language models scale reliably with over-training and on downstream tasks☆101Apr 2, 2024Updated 2 years ago
- ☆23Aug 7, 2023Updated 2 years ago
- Scaling Data-Constrained Language Models☆342Jun 28, 2025Updated last year
- From Easy to Hard: A Dual Curriculum Learning Framework for Context-Aware Document Ranking☆14Oct 25, 2022Updated 3 years ago
- 接地气的大模型工程,争取成为一本大模型实战百科全书☆16Oct 16, 2023Updated 2 years ago
- ☆12Mar 22, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Code for our SIGIR 2022 paper. CONVINSE is a framework for conversational question answering (ConvQA) over heterogeneous information sour…☆12Oct 4, 2023Updated 2 years ago
- Code for the paper "Curriculum Learning Strategies for IR: An Empirical Study on Conversation Response Ranking" at ECIR'20☆17Dec 8, 2022Updated 3 years ago
- Compress conventional Vision-Language Pre-training data☆52Sep 22, 2023Updated 2 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- ☆23Jul 27, 2023Updated 2 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆31May 23, 2024Updated 2 years ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- Official implementation of our paper at ACL 2023: Pre-training Multi-party Dialogue Models with Latent Discourse Inference☆10Jul 10, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆86May 2, 2026Updated 2 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆72Jun 19, 2024Updated 2 years ago
- [ACL'21] Dialogue Response Selection with Hierarchical Curriculum Learning☆21Nov 15, 2022Updated 3 years ago
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- Document Sequence with Subtopic Attention☆19Mar 24, 2023Updated 3 years ago
- Official code implementation of the paper: QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmente…☆42Apr 8, 2026Updated 2 months ago
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago