Code for pre-training CharacterBERT models (as well as BERT models).
☆34Sep 6, 2021Updated 4 years ago
Alternatives and similar repositories for character-bert-pretraining
Users that are interested in character-bert-pretraining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"☆199Oct 3, 2023Updated 2 years ago
- ☆16May 14, 2024Updated last year
- Official code for the paper: "Perception and Semantic Aware Regularization for Sequential Confidence Calibration (CVPR2023)"☆10May 15, 2024Updated last year
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- Biomedical and Clinical BERT for Portuguese Language☆65Dec 12, 2024Updated last year
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- [Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…☆12Sep 17, 2024Updated last year
- RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering☆10Nov 27, 2022Updated 3 years ago
- ☆18Nov 25, 2022Updated 3 years ago
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Jun 4, 2024Updated last year
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- Suite of 500 procedurally-generated NLP tasks to study language model adaptability☆21Jul 16, 2022Updated 3 years ago
- pdf2xml convertor based on Xpdf library - modified version☆27Feb 23, 2018Updated 8 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Feb 16, 2026Updated last month
- CharBERT: Character-aware Pre-trained Language Model (COLING2020)☆120Jan 28, 2021Updated 5 years ago
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learning☆14Jun 3, 2025Updated 9 months ago
- ☆13Dec 17, 2021Updated 4 years ago
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper☆14Aug 9, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 2 years ago
- [ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…☆24Feb 16, 2023Updated 3 years ago
- Generating graph structures from OWL ontologies☆12Nov 21, 2017Updated 8 years ago
- Scene text rectification using glyph and character alignment properties☆22Jan 21, 2018Updated 8 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell☆14May 28, 2023Updated 2 years ago
- A way to rectify curve text images using spatial transformer by pairs of points.☆40Dec 9, 2020Updated 5 years ago
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆42Nov 15, 2023Updated 2 years ago
- ☆11Apr 15, 2022Updated 3 years ago
- The Codebase for Causal Distillation for Language Models (NAACL '22)☆26May 1, 2022Updated 3 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"☆10Feb 17, 2021Updated 5 years ago
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆22Jul 19, 2023Updated 2 years ago
- This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631☆23Aug 15, 2022Updated 3 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago
- A recurrent neural network model to analyze how travelers expressed their feelings on Twitter☆12Jun 30, 2019Updated 6 years ago
- Code for the Shortformer model, from the ACL 2021 paper by Ofir Press, Noah A. Smith and Mike Lewis.☆147Jul 26, 2021Updated 4 years ago
- Massively Multilingual Transfer for NER☆86Oct 7, 2021Updated 4 years ago
- A Better Way to Attend: Attention with Trees for Video Question Answering☆25Mar 25, 2019Updated 7 years ago
- Python library for converting between BioNLP formats☆22Apr 20, 2023Updated 2 years ago
- Implementation of the BLUE benchmark with Transformers.☆20Feb 16, 2024Updated 2 years ago