sagorbrur / codeswitch
CodeSwitch is a NLP tool, can use for language identification, pos tagging, name entity recognition, sentiment analysis of code mixed data.
☆34Updated 4 years ago
Alternatives and similar repositories for codeswitch:
Users that are interested in codeswitch are comparing it to the libraries listed below
- Statistics on multilingual datasets☆17Updated 2 years ago
- This repo contains a set of neural transducer, e.g. sequence-to-sequence model, focusing on character-level tasks.☆74Updated last year
- A Benchmark Dataset for Understanding Disfluencies in Question Answering☆62Updated 3 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆21Updated last year
- CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Switching☆18Updated 3 years ago
- The Benchmark of Linguistic Minimal Pairs☆149Updated 2 years ago
- M2D2: A Massively Multi-domain Language Modeling Dataset (EMNLP 2022) by Machel Reid, Victor Zhong, Suchin Gururangan, Luke Zettlemoyer☆55Updated 2 years ago
- A guide to building language technology in new languages.☆58Updated 3 years ago
- XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning☆101Updated 4 years ago
- NTREX -- News Test References for MT Evaluation☆81Updated 9 months ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Updated 3 years ago
- Neural network sequence labeling model☆11Updated 5 years ago
- ☆38Updated 4 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- ☆21Updated 10 months ago
- This code provides word level language identification tool for identifying language for individual words in Code-Mixed text. e.g. The tex…☆52Updated 4 years ago
- ☆29Updated last year
- This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalenc…☆53Updated 7 months ago
- Code for ACL 2022 paper "Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"☆30Updated 2 years ago
- A program to choose transfer languages for cross-lingual learning☆72Updated last year
- LTG-Bert☆30Updated last year
- ☆17Updated last year
- A tiny BERT for low-resource monolingual models☆31Updated 6 months ago
- ☆24Updated 5 years ago
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆46Updated 2 years ago
- Codebase for probing and visualizing multilingual models.☆47Updated 4 years ago
- A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations☆55Updated 2 years ago
- ☆14Updated 11 months ago
- This repository hosts the code for a tokenizer of tweets.☆12Updated 6 years ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated 11 months ago