A parallel evaluation data set of SAP software documentation with document structure annotation
☆14Jul 30, 2025Updated 7 months ago
Alternatives and similar repositories for software-documentation-data-set-for-machine-translation
Users that are interested in software-documentation-data-set-for-machine-translation are comparing it to the libraries listed below
Sorting:
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 10 months ago
- This repository contains additional reference translations for the WMT'14 En-De (newstest2014) and WMT'19 En-Ru (newstest2019) test sets …☆15Aug 31, 2021Updated 4 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation☆17Jan 18, 2021Updated 5 years ago
- Deployment scripts for NetApps GenAI toolkit☆15Jul 21, 2025Updated 7 months ago
- Decoding platform for machine translation research☆54Aug 24, 2019Updated 6 years ago
- Best Practices in Translation Memory Management☆47Dec 14, 2018Updated 7 years ago
- ☆70Jun 29, 2023Updated 2 years ago
- Translation Memory Open-source Purifier☆35Nov 6, 2022Updated 3 years ago
- ☆10Feb 2, 2021Updated 5 years ago
- 日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark☆38Oct 7, 2025Updated 4 months ago
- A large parallel corpus of English and Japanese☆87Nov 1, 2017Updated 8 years ago
- A framework for evaluating Machine Translation models.☆12May 26, 2025Updated 9 months ago
- Modified version of fairseq, including new implementations for criterions using reinforcement learning methods.☆11Aug 14, 2019Updated 6 years ago
- COMET for African languages☆10Jan 24, 2025Updated last year
- INT260 - Data Classification with Python SDK and SAP AI Business☆10Jun 14, 2022Updated 3 years ago
- This plugin is a helper for sending DITA files to translation.☆10May 14, 2025Updated 9 months ago
- A tool that locates, downloads, and extracts machine translation corpora☆162Sep 18, 2025Updated 5 months ago
- Efficient teacher-student models and scripts to make them☆54Dec 16, 2023Updated 2 years ago
- Tools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.☆41Dec 19, 2023Updated 2 years ago
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation (Findings of EMNLP 2021))☆13Nov 21, 2021Updated 4 years ago
- Lars's datasets☆12Jun 16, 2024Updated last year
- 🍁 Collection of useful R utilities and snippets☆10Nov 21, 2023Updated 2 years ago
- Transformer Implementation for NMT using PyTorch Lightning (Korean to English)☆10Oct 19, 2020Updated 5 years ago
- Meedan's Open Source Arabic/English Translation Memory☆33Nov 4, 2009Updated 16 years ago
- Feature Decay Algorithms☆11Mar 5, 2014Updated 11 years ago
- Rackspace How-To Support Articles☆12Jul 11, 2024Updated last year
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆35Oct 16, 2025Updated 4 months ago
- All code and content for my blog.☆15Sep 23, 2018Updated 7 years ago
- ☆93Feb 13, 2024Updated 2 years ago
- Dockerized NMT frameworks for nmt-wizard☆39Apr 18, 2023Updated 2 years ago
- Pre-trained, multilingual sequence-to-sequence models for Indian languages☆51Jul 20, 2022Updated 3 years ago
- The FLORES+ Machine Translation Benchmark☆111Nov 12, 2024Updated last year
- A library for minimum Bayes risk (MBR) decoding☆51Nov 2, 2025Updated 4 months ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Feb 6, 2024Updated 2 years ago
- ☆11Apr 2, 2024Updated last year
- Image Matting Using Deep Learning☆10Jan 15, 2018Updated 8 years ago
- ParCourE - Parallel Corpus Explorer☆12Dec 27, 2021Updated 4 years ago