stelladk / PretrainingBERT
Pre-training BERT masked language models with custom vocabulary
☆32Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for PretrainingBERT
- ☆61Updated last year
- SciFive: a text-text transformer model for biomedical literature☆90Updated 5 months ago
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆96Updated last year
- Source code for paper "Learning from Noisy Labels for Entity-Centric Information Extraction", EMNLP 2021☆55Updated 2 years ago
- EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?☆54Updated last year
- ☆42Updated 2 years ago
- [Neurips2023] Source code for Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory☆54Updated last year
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆60Updated last year
- ☆40Updated 3 years ago
- Language models are open knowledge graphs ( non official implementation )☆168Updated 4 years ago
- Hierarchical Attention Transformers (HAT)☆45Updated 10 months ago
- A Python Commonsense Knowledge Inference Toolkit☆63Updated 11 months ago
- The autoregressive information extraction system GenIE (Generative Information Extraction) implemented in PyTorch.☆99Updated last year
- Joint Multilingual Knowledge Graph Completion and Alignment (Findings of EMNLP 2022) (Pytorch)☆34Updated 2 years ago
- A framework for few-shot evaluation of autoregressive language models.☆101Updated last year
- [ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links☆421Updated 2 years ago
- GISTEmbed: Guided In-sample Selection of Training Negatives for Text Embeddings☆37Updated 8 months ago
- Neighborhood Contrastive Learning for Scientific Document Representations with Citation Embeddings (EMNLP 2022 paper)☆64Updated 2 years ago
- Dataset, models, and code for paper "CiteSum: Citation Text-guided Scientific Extreme Summarization and Low-resource Domain Adaptation", …☆33Updated 2 years ago
- A set of Python scripts for preprocessing the Wikidata JSON dump and running simple queries in an efficient manner.☆102Updated last month
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆149Updated last year
- Repository for ACL'22 paper: Dynamic Latent Extraction for Abstractive Long-Input Summarization☆55Updated last year
- Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.☆50Updated last year
- Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”☆61Updated 3 years ago
- cRocoDiLe is a dataset extraction tool for Relation Extraction using Wikipedia and Wikidata presented in REBEL (EMNLP 2021).☆64Updated last year
- The code and data used for EACL2023 Paper: "Large Language Models are few(1)-shot Table Reasoners"☆39Updated 6 months ago
- Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"☆80Updated last year
- ☆57Updated last year
- CoDEx: A set of knowledge graph Completion Datasets Extracted from Wikidata and Wikipedia☆153Updated 3 months ago
- Improving Biomedical Pretrained Language Models with Knowledge [BioNLP 2021]☆65Updated last year