Statistics on multilingual datasets
☆17Jul 12, 2022Updated 3 years ago
Alternatives and similar repositories for multilingual-data-stats
Users that are interested in multilingual-data-stats are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Switching☆18Mar 29, 2021Updated 5 years ago
- [Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring☆12Dec 1, 2023Updated 2 years ago
- Library for fast text representation and classification.☆31Jan 9, 2024Updated 2 years ago
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for the paper "Greed is All You Need: An Evaluation of Tokenizer Inference Methods"☆13Nov 26, 2024Updated last year
- Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"☆10Aug 11, 2023Updated 2 years ago
- Minimum Description Length Recurrent Neural Networks☆19Jun 9, 2023Updated 2 years ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 3 years ago
- Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"☆14Feb 16, 2021Updated 5 years ago
- Neural Network based models for Aspect-Based Sentiment Analysis☆23Apr 30, 2018Updated 8 years ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- Python API for loading language data from American-English CHILDES database☆18Aug 14, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Aug 2, 2021Updated 4 years ago
- ☆17Oct 24, 2020Updated 5 years ago
- NTREX -- News Test References for MT Evaluation☆87Jun 5, 2024Updated last year
- Explicit Alignment Objectives for Multilingual Bidirectional Encoders☆14Apr 14, 2021Updated 5 years ago
- MLE-Guided Parameter Search (AAAI 2021)☆12Sep 16, 2021Updated 4 years ago
- Feature Decay Algorithms☆11Mar 5, 2014Updated 12 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆25Feb 19, 2021Updated 5 years ago
- Micro-framework for publishing linked data☆11Aug 1, 2017Updated 8 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆39Feb 5, 2026Updated 3 months ago
- ☆46Apr 13, 2022Updated 4 years ago
- The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generati…☆16Sep 1, 2021Updated 4 years ago
- New York Times Word Innovation Types dataset☆21Dec 1, 2020Updated 5 years ago
- Convert ABN Amro CSV bank statements to QIF☆11Jun 8, 2017Updated 8 years ago
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated 2 years ago
- ☆16Oct 17, 2024Updated last year
- Formulaire en ligne qui génère une attestation de déplacement dérogatoire☆10Mar 18, 2020Updated 6 years ago
- Code and experiments for the COLING2020 paper "Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations".☆11Dec 9, 2020Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Variational Walkback, NIPS'17☆28Oct 18, 2017Updated 8 years ago
- Code from blog 'Searching by Music: Leveraging Vector Search for Music Information Retrieval'☆16Nov 16, 2023Updated 2 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"☆29Jun 2, 2021Updated 4 years ago
- Machine Comprehension Train on MSMARCO with S-NET Extraction Modification☆31Feb 10, 2023Updated 3 years ago
- Code Generator☆23Feb 16, 2023Updated 3 years ago
- WProofreader software development kit (SDK) offers multilingual spelling & grammar check API and JavaScript libraries for rich text edito…☆13Apr 30, 2026Updated 3 weeks ago
- Builds a WMT18-like corpus for word-level QE with annotations in the source and target words.☆10Sep 19, 2022Updated 3 years ago