EleutherAI / pile-pubmedcentralLinks
A script for collecting the PubMed Central dataset in a language modelling friendly format.
☆25Updated 4 years ago
Alternatives and similar repositories for pile-pubmedcentral
Users that are interested in pile-pubmedcentral are comparing it to the libraries listed below
Sorting:
- Download, parse, and filter data PubMed, data-ready for The-Pile☆23Updated 4 years ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆47Updated 10 months ago
- Parkar and Kim et al.'s paper on Can LLMs Select Important Instructions to Annotate?"☆13Updated last year
- Medical reasoning using large language models☆92Updated 2 years ago
- Pre-trained Language Model for Scientific Text☆45Updated last year
- Code for our EMNLP '22 paper "Fixing Model Bugs with Natural Language Patches"☆19Updated 3 years ago
- Official implementation of the ACL 2024: Scientific Inspiration Machines Optimized for Novelty☆93Updated last year
- ☆49Updated 3 years ago
- Google Research☆46Updated 3 years ago
- ☆28Updated 11 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Updated last year
- Embedding Recycling for Language models☆38Updated 2 years ago
- [NeurIPS 2023 Main Track] This is the repository for the paper titled "Don’t Stop Pretraining? Make Prompt-based Fine-tuning Powerful Lea…☆76Updated last year
- Few-shot Learning with Auxiliary Data☆31Updated 2 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Updated last week
- Medical ML Benchmark☆11Updated 2 years ago
- Codes for paper: Mixture-of-Partitions: Infusing Large Biomedical Knowledge Graphs into BERT☆34Updated 3 years ago
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year
- ☆25Updated 2 years ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- Offiical codes for DNA-GPT (ICLR 2024)☆56Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated 2 years ago
- Pretraining Efficiently on S2ORC!☆179Updated last year
- Transformers at any scale☆42Updated 2 years ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Updated 2 years ago
- Adding new tasks to T0 without catastrophic forgetting☆33Updated 3 years ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆26Updated last week
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆49Updated 2 years ago
- ☆26Updated 2 years ago
- In-BoXBART: Get Instructions into Biomedical Multi-task Learning☆14Updated 3 years ago