thoppe / The-Pile-PubMed
Download, parse, and filter data PubMed, data-ready for The-Pile
☆20Updated 3 years ago
Alternatives and similar repositories for The-Pile-PubMed:
Users that are interested in The-Pile-PubMed are comparing it to the libraries listed below
- ☆23Updated 2 months ago
- A script for collecting the PubMed Central dataset in a language modelling friendly format.☆23Updated 3 years ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆25Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆31Updated 8 months ago
- Tasks for describing differences between text distributions.☆16Updated 5 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆29Updated last month
- Supporting code for ReCEval paper☆27Updated 4 months ago
- ☆39Updated 2 years ago
- ☆47Updated 9 months ago
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated last year
- Few-shot Learning with Auxiliary Data☆26Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆32Updated 2 years ago
- ☆33Updated 9 months ago
- [ACL 2024 Findings] This is the code for our paper "Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation wi…☆36Updated 6 months ago
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆15Updated 3 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆32Updated 5 months ago
- ☆14Updated 10 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆35Updated 7 months ago
- ☆64Updated 11 months ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆43Updated last year
- ☆44Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆24Updated 4 months ago
- Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval☆29Updated last year
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆22Updated last year
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆64Updated 2 months ago
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆58Updated last year
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated last year
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆29Updated last year
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- ☆110Updated 6 months ago