Code for pre-training CharacterBERT models (as well as BERT models).
☆34Sep 6, 2021Updated 4 years ago
Alternatives and similar repositories for character-bert-pretraining
Users that are interested in character-bert-pretraining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters"☆199Oct 3, 2023Updated 2 years ago
- ☆16May 14, 2024Updated last year
- 🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)☆17Feb 27, 2026Updated last month
- Official code for the paper: "Perception and Semantic Aware Regularization for Sequential Confidence Calibration (CVPR2023)"☆10May 15, 2024Updated last year
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Notes on papers in Natural Language Processing, Computational Linguistics, and the related sciences☆14Updated this week
- Biomedical and Clinical BERT for Portuguese Language☆66Dec 12, 2024Updated last year
- [Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…☆12Sep 17, 2024Updated last year
- RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering☆10Nov 27, 2022Updated 3 years ago
- ☆14May 26, 2023Updated 2 years ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- Arabic edition of ALBERT pretrained language models☆16Apr 25, 2021Updated 4 years ago
- Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022☆14Nov 7, 2022Updated 3 years ago
- A python module for evaluating NERC and NEL system performances as defined in the HIPE shared tasks (formerly CLEF-HIPE-2020-scorer).☆15Jun 4, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- BERT implementation for radiology full-text reports☆11Jul 25, 2020Updated 5 years ago
- ☆16Dec 14, 2022Updated 3 years ago
- Data for the HIPE 2022 shared task.☆21Nov 29, 2023Updated 2 years ago
- Suite of 500 procedurally-generated NLP tasks to study language model adaptability☆21Jul 16, 2022Updated 3 years ago
- pdf2xml convertor based on Xpdf library - modified version☆27Feb 23, 2018Updated 8 years ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resource…☆27Feb 16, 2026Updated 2 months ago
- SEM, a free NLP tool relying on machine learning technologies, especially CRFs.☆23Dec 1, 2021Updated 4 years ago
- Embedding Recycling for Language models☆38Jul 11, 2023Updated 2 years ago
- Code base for the EMNLP 2021 Findings paper: Cartography Active Learning☆14Jun 3, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆13Dec 17, 2021Updated 4 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 2 years ago
- [ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…☆24Feb 16, 2023Updated 3 years ago
- Generating graph structures from OWL ontologies☆12Nov 21, 2017Updated 8 years ago
- Scene text rectification using glyph and character alignment properties☆22Jan 21, 2018Updated 8 years ago
- CHARacter-awaRE Diffusion: Multilingual Character-Aware Encoders for Font-Aware Diffusers That Can Actually Spell☆14May 28, 2023Updated 2 years ago
- A way to rectify curve text images using spatial transformer by pairs of points.☆40Dec 9, 2020Updated 5 years ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22May 24, 2023Updated 2 years ago
- ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost☆42Nov 15, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆11Apr 15, 2022Updated 4 years ago
- The Codebase for Causal Distillation for Language Models (NAACL '22)☆26May 1, 2022Updated 3 years ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆22Jul 19, 2023Updated 2 years ago
- Collection of LaTeX utility packages for scientific documents☆17Sep 13, 2023Updated 2 years ago
- This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631☆23Aug 15, 2022Updated 3 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 8 months ago