helboukkouri/character-bert-pretraining

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/helboukkouri/character-bert-pretraining)

helboukkouri / character-bert-pretraining

Code for pre-training CharacterBERT models (as well as BERT models).

☆34

Alternatives and similar repositories for character-bert-pretraining

Users that are interested in character-bert-pretraining are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

orevaahia / magnet-tokenization
View on GitHub
☆11Mar 17, 2026Updated 4 months ago
husterpzh / PSSR
View on GitHub
Official code for the paper: "Perception and Semantic Aware Regularization for Sequential Confidence Calibration （CVPR2023）"
☆10May 15, 2024Updated 2 years ago
HAILab-PUCPR / BioBERTpt
View on GitHub
Biomedical and Clinical BERT for Portuguese Language
☆68Dec 12, 2024Updated last year
xiaojino / RUArt
View on GitHub
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Nov 27, 2022Updated 3 years ago
DFKI-NLP / MobIE
View on GitHub
[Konvens21] This repository contains the DFKI MobIE Corpus, a dataset of 3,232 German-language documents that have been annotated with fi…
☆12Sep 17, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Xiaomeng-Yang / STR_benchmark_cleansed
View on GitHub
☆14May 26, 2023Updated 3 years ago
flairNLP / familiarity
View on GitHub
Label shift estimation for transfer difficulty with Familiarity.
☆10Feb 4, 2025Updated last year
MeLeLBGU / SaGe
View on GitHub
Code for SaGe subword tokenizer (EACL 2023)
☆28Nov 30, 2024Updated last year
teffland / ner-expected-entity-ratio
View on GitHub
Implementation and experiments for Partially Supervised NER via Expected Entity Ratio in TACL 2022
☆14Nov 7, 2022Updated 3 years ago
OnlpLab / NEMO
View on GitHub
Neural Modeling for Named Entities and Morphology (Hebrew NER)
☆34Dec 20, 2022Updated 3 years ago
MaastrichtU-IDS / federatedQueryKG
View on GitHub
This repository is a workplace for COST Action Hackathon event on Federated Query over Knowledge Graphs which will happen on 25-27 April …
☆11Jun 14, 2023Updated 3 years ago
hipe-eval / HIPE-2022-data
View on GitHub
Data for the HIPE 2022 shared task.
☆23May 15, 2026Updated 2 months ago
kermitt2 / pdf2xml
View on GitHub
pdf2xml convertor based on Xpdf library - modified version
☆27Feb 23, 2018Updated 8 years ago
belindal / TaskBench500
View on GitHub
Suite of 500 procedurally-generated NLP tasks to study language model adaptability
☆21Jul 16, 2022Updated 4 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
minimaxir / gpt-2-fanfiction
View on GitHub
Experiments with generating GPT-2 fanfiction on specified topics.
☆11Jun 2, 2019Updated 7 years ago
allenai / EmbeddingRecycling
View on GitHub
Embedding Recycling for Language models
☆38Jul 11, 2023Updated 3 years ago
mawentao277 / CharBERT
View on GitHub
CharBERT: Character-aware Pre-trained Language Model (COLING2020)
☆122Jan 28, 2021Updated 5 years ago
jjzha / cartography-al
View on GitHub
Code base for the EMNLP 2021 Findings paper: Cartography Active Learning
☆14Jun 3, 2025Updated last year
VITA-Group / layerGraftedPretraining_ICLR23
View on GitHub
[ICLR 2023] “ Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Better Representations”, Ziyu Jian…
☆24Feb 16, 2023Updated 3 years ago
taeho-kil / Scene-Text-Rectification
View on GitHub
Scene text rectification using glyph and character alignment properties
☆22Jan 21, 2018Updated 8 years ago
AI21Labs / pmi-masking
View on GitHub
This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper
☆14Aug 9, 2021Updated 4 years ago
yuweihao / LV-BERT
View on GitHub
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)
☆18May 10, 2023Updated 3 years ago
jouniluoma / bert-ner-cmv
View on GitHub
☆13Dec 17, 2021Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
frotms / Curve-Text-Rectification-Using-Pairs-Of-Points
View on GitHub
A way to rectify curve text images using spatial transformer by pairs of points.
☆40Dec 9, 2020Updated 5 years ago
twinkle0331 / Xcompression
View on GitHub
[ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)
☆22May 24, 2023Updated 3 years ago
andikarachman / RNN-Twitter-Sentiment-Analysis
View on GitHub
A recurrent neural network model to analyze how travelers expressed their feelings on Twitter
☆12Jun 30, 2019Updated 7 years ago
Planet-AI-GmbH / tfaip-hybrid-ctc-s2s
View on GitHub
Repository sharing code and the model for the paper "Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes"
☆17Oct 13, 2021Updated 4 years ago
tigerchen52 / LOVE
View on GitHub
ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
☆41Nov 15, 2023Updated 2 years ago
rlcmtzc / SICSS-Python-Crash-Course
View on GitHub
The Python crash course of the Summer Institute in Computational Social Science 2022!
☆10Nov 19, 2022Updated 3 years ago
Form2Seq-Data / Dataset
View on GitHub
Dataset corresponding to the paper: "Form2Seq : A Framework for Higher-Order Form Structure Extraction"
☆10Feb 17, 2021Updated 5 years ago
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 8 months ago
qdrant / miniCOIL
View on GitHub
Contextualized per-token embeddings
☆38Jul 22, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
mbahri / binary_gnn
View on GitHub
Code for our paper "Binary Graph Neural Networks", CVPR 2021
☆38Apr 8, 2021Updated 5 years ago
lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
Sreyan88 / ACLM
View on GitHub
Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
☆22Jul 19, 2023Updated 3 years ago
avi33 / universalmelgan
View on GitHub
This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631
☆23Aug 15, 2022Updated 3 years ago
SAP / software-documentation-data-set-for-machine-translation
View on GitHub
A parallel evaluation data set of SAP software documentation with document structure annotation
☆15Jun 12, 2026Updated last month
xxcclong / GNN-Computing
View on GitHub
Artifact for PPoPP20 "Understanding and Bridging the Gaps in Current GNN Performance Optimizations"
☆42Nov 16, 2021Updated 4 years ago
Enescigdem / SignLanguageRecognizer
View on GitHub
☆16Nov 8, 2020Updated 5 years ago