thoppe / The-Pile-PubMedLinks
Download, parse, and filter data PubMed, data-ready for The-Pile
☆23Updated 3 years ago
Alternatives and similar repositories for The-Pile-PubMed
Users that are interested in The-Pile-PubMed are comparing it to the libraries listed below
Sorting:
- A script for collecting the PubMed Central dataset in a language modelling friendly format.☆24Updated 4 years ago
- Few-shot Learning with Auxiliary Data☆31Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 5 months ago
- Pre-trained Language Model for Scientific Text☆46Updated last year
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆85Updated last year
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated last year
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated last year
- Retrieval as Attention☆83Updated 2 years ago
- ☆28Updated 6 months ago
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆37Updated 2 years ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆46Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆48Updated 7 months ago
- [NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Lea…☆75Updated last year
- ☆14Updated last year
- the instructions and demonstrations for building a formal logical reasoning capable GLM☆54Updated last year
- ☆74Updated last year
- Interpretating the latent space representations of attention head outputs for LLMs☆34Updated last year
- This is the official PyTorch repo for "UNIREX: A Unified Learning Framework for Language Model Rationale Extraction" (ICML 2022).☆26Updated 2 years ago
- Exploration of automated dataset selection approaches at large scales.☆47Updated 6 months ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆99Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆166Updated 10 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆61Updated 2 years ago
- ☆125Updated last year
- Pile Deduplication Code☆19Updated 2 years ago
- ☆27Updated 2 years ago
- Data Valuation on In-Context Examples (ACL23)☆24Updated 7 months ago
- ☆38Updated 3 years ago
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆110Updated 2 years ago
- Google Research☆45Updated 2 years ago
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 2 years ago