A tiny BERT for low-resource monolingual models
☆31Dec 24, 2025Updated 3 months ago
Alternatives and similar repositories for microbert
Users that are interested in microbert are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Analyzing mBERT's multilinguality in a small laboratory setting☆13Jun 12, 2023Updated 2 years ago
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- decontamination☆29Mar 4, 2026Updated last month
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆97Feb 9, 2023Updated 3 years ago
- ☆28Feb 24, 2025Updated last year
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- MasakhaNEWS: News Topic Classification for African Languages☆26May 12, 2024Updated last year
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- This repository contains the code for applying One-Token Approximation to a pretrained language model using subword-level tokenization.☆11May 7, 2020Updated 5 years ago
- Aspect based sentiment analysis for Hindi☆11Aug 31, 2017Updated 8 years ago
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- LTG-Bert☆34Jan 8, 2024Updated 2 years ago
- Word acquisition in neural language models (TACL 2022).☆20Jan 30, 2025Updated last year
- CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed dat…☆37Nov 2, 2020Updated 5 years ago
- Pre-training BART in Flax on The Pile dataset☆22Jul 24, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A Streamlit app to add structured tags to a dataset card☆22Jun 30, 2022Updated 3 years ago
- ☆14Jun 24, 2024Updated last year
- This repository includes the masking vocabulary used in the ICLR 2021 spotlight PMI-Masking paper☆14Aug 9, 2021Updated 4 years ago
- Dock You a Moses: Moses Statistical MT in a container☆14Feb 18, 2020Updated 6 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 2 years ago
- [EMNLP 2022] Adapting a Language Model While Preserving its General Knowledge☆21Feb 12, 2023Updated 3 years ago
- Code for the Ask4Help project☆22Nov 24, 2022Updated 3 years ago
- ☆12Mar 17, 2026Updated 3 weeks ago
- Codebase for probing and visualizing multilingual models.☆49May 13, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A zero-config OpenAI client with support for 20+ providers, API key rotation, rate limits, optional LangChain integration and more.☆19Dec 11, 2025Updated 3 months ago
- Code for Navigating Connected Memories with a Task-oriented Dialog System☆17Dec 12, 2022Updated 3 years ago
- Embedding Recycling for Language models☆38Jul 11, 2023Updated 2 years ago
- Workshop that demonstrates using and analyzing text in R.☆26Sep 9, 2018Updated 7 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Apr 2, 2022Updated 4 years ago
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆98Apr 26, 2023Updated 2 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Automatic Detection of Potentially Idiomatic Expressions☆12Feb 19, 2021Updated 5 years ago
- The code and data for our paper (EMNLP 2023 findings) "Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition".☆35Jul 17, 2025Updated 8 months ago
- Code for ACL 2023 Paper: ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER☆22Jul 19, 2023Updated 2 years ago
- Multilingual Open Text☆25May 8, 2025Updated 11 months ago
- Official implementation of "GPT or BERT: why not both?"☆63Jul 28, 2025Updated 8 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆22Jan 25, 2023Updated 3 years ago
- Crosslingual Question Answering for African Languages☆31Sep 27, 2024Updated last year