thoppe / The-Pile-PhilPapers
Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for The-Pile-PhilPapers
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated 9 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated this week
- Minimum Description Length probing for neural network representations☆16Updated 2 weeks ago
- Few-shot Learning with Auxiliary Data☆26Updated 11 months ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Embedding Recycling for Language models☆38Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆13Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 9 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆17Updated 3 weeks ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆23Updated 2 months ago
- Entailment self-training☆25Updated last year
- ☆14Updated last year
- ☆14Updated last month
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆18Updated 2 months ago
- ☆25Updated 11 months ago
- ☆8Updated 2 months ago
- ☆19Updated last year
- Code for Stage-wise Fine-tuning for Graph-to-Text Generation☆26Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆44Updated last year
- [ICML 24 NGSM workshop] Associative Recurrent Memory Transformer implementation and scripts for training and evaluating☆31Updated this week
- Byte-sized text games for code generation tasks on virtual environments☆17Updated 4 months ago
- ☆26Updated 4 months ago
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆24Updated 4 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆23Updated 6 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆32Updated last year
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆11Updated 8 months ago
- Adding new tasks to T0 without catastrophic forgetting☆30Updated 2 years ago