A tiny BERT for low-resource monolingual models
☆32Dec 24, 2025Updated 6 months ago
Alternatives and similar repositories for microbert
Users that are interested in microbert are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- “Style Transfer as Data Augmentation: A Case Study on Named Entity Recognition” (EMNLP 2022)☆16Feb 2, 2023Updated 3 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Dec 24, 2022Updated 3 years ago
- Experiments for XLM-V Transformers Integeration☆13Feb 8, 2023Updated 3 years ago
- CD20200004 from 01/01/2021 to 31/12/2023 - LIG UGA - Python Notebook and Models for the MT Lab @ ALPS 2022☆13Apr 1, 2024Updated 2 years ago
- Structural Supervision & Human Psycholinguistic Data☆13Apr 16, 2021Updated 5 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆13Oct 27, 2021Updated 4 years ago
- ☆28Apr 19, 2026Updated 2 months ago
- ☆11Nov 27, 2022Updated 3 years ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"☆13Dec 14, 2021Updated 4 years ago
- This repository contains the code for applying One-Token Approximation to a pretrained language model using subword-level tokenization.☆12May 7, 2020Updated 6 years ago
- Aspect based sentiment analysis for Hindi☆11Aug 31, 2017Updated 8 years ago
- [NeurIPS 2022]MorphTE: Injecting Morphology in Tensorized Embeddings☆17Oct 29, 2022Updated 3 years ago
- ☆24May 4, 2022Updated 4 years ago
- Word acquisition in neural language models (TACL 2022).☆21Jan 30, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- LTG-Bert☆34Jan 8, 2024Updated 2 years ago
- CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed dat…☆37Nov 2, 2020Updated 5 years ago
- MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)☆57Apr 2, 2023Updated 3 years ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆26Jun 2, 2021Updated 5 years ago
- Pre-training BART in Flax on The Pile dataset☆22Jul 24, 2021Updated 4 years ago
- A Streamlit app to add structured tags to a dataset card☆23Jun 30, 2022Updated 4 years ago
- ☆14Jun 24, 2024Updated 2 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18May 10, 2023Updated 3 years ago
- Dock You a Moses: Moses Statistical MT in a container☆14Feb 18, 2020Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [EMNLP 2022] Adapting a Language Model While Preserving its General Knowledge☆21Feb 12, 2023Updated 3 years ago
- Datamodels for hugging face tokenizers☆107Jun 18, 2026Updated last week
- Code for the Ask4Help project☆22Nov 24, 2022Updated 3 years ago
- ☆11Mar 17, 2026Updated 3 months ago
- Codebase for probing and visualizing multilingual models.☆48May 13, 2020Updated 6 years ago
- Code for Navigating Connected Memories with a Task-oriented Dialog System☆18Dec 12, 2022Updated 3 years ago
- Workshop that demonstrates using and analyzing text in R.☆26Sep 9, 2018Updated 7 years ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Apr 2, 2022Updated 4 years ago
- Project of Singing Voice Conversion.☆16Oct 27, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"☆28Oct 3, 2021Updated 4 years ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆32Feb 7, 2025Updated last year
- CSE201 Objected-Oriented Programming in C++: Teach an AI to produce pieces of music☆12Jan 23, 2019Updated 7 years ago
- [ICML 2023] Exploring the Benefits of Training Expert Language Models over Instruction Tuning☆99Apr 26, 2023Updated 3 years ago
- [NAACL 2024] A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- Backup of the sources for my SJPO Teaching Notes☆10Apr 15, 2019Updated 7 years ago