Statistics on multilingual datasets
☆17Jul 12, 2022Updated 3 years ago
Alternatives and similar repositories for multilingual-data-stats
Users that are interested in multilingual-data-stats are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- COMET for African languages☆11Jan 24, 2025Updated last year
- CodemixedNLP: An Extensible and Open NLP Toolkit for Code-Switching☆18Mar 29, 2021Updated 5 years ago
- Prosody-semantics Interface in Seoul Korean☆12Oct 9, 2020Updated 5 years ago
- ☆18Feb 4, 2025Updated last year
- [Kauf & Ivanova, ACL 2023] A Better Way to Do Masked Language Model Scoring☆12Dec 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Library for fast text representation and classification.☆31Jan 9, 2024Updated 2 years ago
- PANiC - PAraphrasing Noun-Compounds☆15Apr 6, 2018Updated 8 years ago
- Collection of academic works in natural language processing, computational linguistics, and computational cognitive science that study th…☆22Mar 20, 2024Updated 2 years ago
- Source code repo for paper "TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation"☆10Aug 11, 2023Updated 2 years ago
- Code for the ILNewsDiff Twitter account☆10May 23, 2023Updated 2 years ago
- Code for paper "When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data"☆14Feb 16, 2021Updated 5 years ago
- Neural Network based models for Aspect-Based Sentiment Analysis☆23Apr 30, 2018Updated 7 years ago
- Open Vietnamese NLP Resources☆19May 3, 2021Updated 4 years ago
- Finds snippets in iambic pentameter in English-language text and tries to combine them to a rhyming sonnet.☆13Jan 5, 2023Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"☆22Feb 14, 2024Updated 2 years ago
- Companion Repo for the book The Applied ML Field Manual, Prithiviraj Damodaran☆12Jun 22, 2022Updated 3 years ago
- Meedan's Open Source Arabic/English Translation Memory☆33Nov 4, 2009Updated 16 years ago
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11May 4, 2022Updated 3 years ago
- As good as new. How to successfully recycle English GPT-2 to make models for other languages (ACL Findings 2021)☆48Aug 2, 2021Updated 4 years ago
- NTREX -- News Test References for MT Evaluation☆87Jun 5, 2024Updated last year
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 5 months ago
- Three little Python scripts for data preparation: remove commas, add commas, concatenate files☆16Jul 26, 2017Updated 8 years ago
- MLE-Guided Parameter Search (AAAI 2021)☆12Sep 16, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 8 months ago
- Code for SaGe subword tokenizer (EACL 2023)☆28Nov 30, 2024Updated last year
- Code for EMNLP 2016 paper: Morphological Priors for Probabilistic Word Embeddings☆53Dec 6, 2016Updated 9 years ago
- Feature Decay Algorithms☆11Mar 5, 2014Updated 12 years ago
- All code and content for my blog.☆15Sep 23, 2018Updated 7 years ago
- Pyinfer is a model agnostic tool for ML developers and researchers to benchmark the inference statistics for machine learning models or f…☆24Feb 19, 2021Updated 5 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- ☆46Apr 13, 2022Updated 4 years ago
- Convert ABN Amro CSV bank statements to QIF☆11Jun 8, 2017Updated 8 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated last year
- Formulaire en ligne qui génère une attestation de déplacement dérogatoire☆10Mar 18, 2020Updated 6 years ago
- Code and data for "A Systematic Assessment of Syntactic Generalization in Neural Language Models"☆29Jun 18, 2021Updated 4 years ago
- BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText☆10Sep 3, 2019Updated 6 years ago
- Dataset for the NLPMC @ NAACL 2021 Paper: Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?☆16Sep 28, 2021Updated 4 years ago
- Source code for the paper "Multilingual Neural Machine Translation with Soft Decoupled Encoding"☆29Jun 2, 2021Updated 4 years ago
- Code Generator☆23Feb 16, 2023Updated 3 years ago