microsoft/GLUECoS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/microsoft/GLUECoS)

microsoft / GLUECoS

A benchmark for code-switched NLP, ACL 2020

☆76

Alternatives and similar repositories for GLUECoS

Users that are interested in GLUECoS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mrinaldhar / en-hi-codemixed-corpus
View on GitHub
Repository for the English-Hindi Codemixed to Monolingual English Parallel Corpus
☆13Feb 17, 2019Updated 7 years ago
microsoft / CodeMixed-Text-Generator
View on GitHub
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…
☆62Jul 30, 2024Updated last year
gentaiscool / code-switching-papers
View on GitHub
A curated list of research papers and resources on code-switching
☆344Jan 31, 2026Updated 5 months ago
sumanbanerjee1 / Code-Mixed-Dialog
View on GitHub
☆33Jun 20, 2018Updated 8 years ago
irshadbhat / csnli
View on GitHub
Language identification and normalisation in code switching data tailored with a three-step decoding process
☆24Dec 23, 2019Updated 6 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
gentaiscool / miners
View on GitHub
MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models. (EMNLP 2024 Findings)
☆14Oct 3, 2024Updated last year
bidishasamantakgp / VACS
View on GitHub
Code and data for "A Deep Generative Model for Code-Switched Text" accepted in IJCAI 2019
☆16Nov 14, 2019Updated 6 years ago
sagorbrur / codeswitch
View on GitHub
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed dat…
☆37Nov 2, 2020Updated 5 years ago
AI4Bharat / indic-bart
View on GitHub
Pre-trained, multilingual sequence-to-sequence models for Indian languages
☆51Jul 20, 2022Updated 4 years ago
microsoft / LID-tool
View on GitHub
This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…
☆60Aug 11, 2020Updated 5 years ago
aparnadutta / code-mixed-lid
View on GitHub
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
☆10Aug 13, 2023Updated 2 years ago
Kartikaggarwal98 / Indian_ParallelCorpus
View on GitHub
Curated list of publicly available parallel corpus for Indian Languages
☆36Jul 15, 2021Updated 5 years ago
murali1996 / CodemixedNLP
View on GitHub
CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Switching
☆18Mar 29, 2021Updated 5 years ago
eyalbd2 / RL-based-Language-Modeling
View on GitHub
☆13Jan 27, 2019Updated 7 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
piyushmakhija5 / hinglishNorm
View on GitHub
A Hindi-English Dataset for Text Normalization
☆18Jan 3, 2022Updated 4 years ago
steve-wilson / nlpcss201-sm-preprocessing
View on GitHub
Materials from the NLPCSS 201 Social Media Preprocessing Tutorial, March 16, 2022
☆13Nov 10, 2022Updated 3 years ago
SilentFlame / Named-Entity-Recognition
View on GitHub
Corpus and a baseline neural network system for Named Entity Recognition in Hindi-English Code-Mixed social media text.
☆46Sep 25, 2020Updated 5 years ago
cbaziotis / lm-prior-for-nmt
View on GitHub
This repository contains source code for the paper "Language Model Prior for Low-Resource Neural Machine Translation"
☆43Mar 16, 2021Updated 5 years ago
evgeniiaraz / datasets_multiling_dialogue
View on GitHub
Multilingual Dialogue Datasets
☆19Aug 18, 2022Updated 3 years ago
sinaahmadi / PersoArabicLID
View on GitHub
PALI: Language identification for Perso-Arabic Scripts
☆11Jul 11, 2023Updated 3 years ago
muhaochen / bilingual_dictionaries
View on GitHub
This repository contains the source code and links to some datasets used in the CoNLL 2019 paper "Learning to Represent Bilingual Diction…
☆12Oct 1, 2020Updated 5 years ago
Merterm / Modeling-Intensification-for-SLG
View on GitHub
Public repo for the paper: "Modeling Intensification for Sign Language Generation: A Computational Approach" by Mert Inan*, Yang Zhong*, …
☆14Mar 15, 2022Updated 4 years ago
google-research-datasets / dakshina
View on GitHub
The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the datase…
☆211May 27, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ictnlp / PLUVR
View on GitHub
Code for ACL 2022 main conference paper "Neural Machine Translation with Phrase-Level Universal Visual Representations".
☆21Oct 25, 2023Updated 2 years ago
BatsResearch / crosslingual-test-time-scaling
View on GitHub
Crosslingual Reasoning through Test-Time Scaling
☆21May 13, 2025Updated last year
salesforce / adversarial-polyglots
View on GitHub
Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)
☆10May 1, 2025Updated last year
allenai / unifew
View on GitHub
Unifew: Unified Fewshot Learning Model
☆18Sep 10, 2021Updated 4 years ago
allenai / better-promptability
View on GitHub
☆11Nov 27, 2022Updated 3 years ago
allenai / flex
View on GitHub
Few-shot NLP benchmark for unified, rigorous eval
☆93Jul 12, 2022Updated 4 years ago
ShareChatAI / MACD
View on GitHub
☆19Feb 22, 2024Updated 2 years ago
bootphon / abnet3
View on GitHub
Siamese network for unsupervised speech representation learning
☆11Oct 12, 2018Updated 7 years ago
JasonForJoy / FIRE
View on GitHub
EMNLP 2020: Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots
☆12Dec 15, 2020Updated 5 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
suyash / mlt
View on GitHub
Multilingual Neural Machine Translation using Transformers with Conditional Normalization.
☆18Mar 24, 2023Updated 3 years ago
metarank / ltrlib
View on GitHub
A Learn-to-Rank algorithm library
☆13Aug 15, 2024Updated last year
anoopkunchukuttan / indic_nlp_library
View on GitHub
Resources and tools for Indian language Natural Language Processing
☆639Jun 7, 2024Updated 2 years ago
lgessler / microbert
View on GitHub
A tiny BERT for low-resource monolingual models
☆32Dec 24, 2025Updated 6 months ago
AI4Bharat / indicnlp_catalog
View on GitHub
A collaborative catalog of NLP resources for Indic languages
☆638Dec 14, 2024Updated last year
AI4Bharat / Indic-BERT-v1
View on GitHub
Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.c…
☆297May 11, 2023Updated 3 years ago
google-research / xtreme
View on GitHub
XTREME is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models that covers 40 ty…
☆651Jan 4, 2023Updated 3 years ago