Multilingual sentence alignment using sentence embeddings
☆146Nov 4, 2024Updated last year
Alternatives and similar repositories for bertalign
Users that are interested in bertalign are comparing it to the libraries listed below
Sorting:
- Improved Sentence Alignment in Linear Time and Space☆192Mar 6, 2023Updated 3 years ago
- ☆37Mar 16, 2026Updated last week
- A 2024 Reading List for Bilingual Lexicon Induction (BLI) / Word Translation. Frequently Updated.☆23Sep 29, 2024Updated last year
- A corpus of short answers written by learners of English and graded with CEFR levels☆12Dec 17, 2021Updated 4 years ago
- A neural word aligner based on multilingual BERT☆374Mar 10, 2022Updated 4 years ago
- Code for our paper in ACL 2017☆13Dec 14, 2017Updated 8 years ago
- A accurate multilingual word aligner based on LaBSE☆24Oct 25, 2023Updated 2 years ago
- A fully-fledge PyTorch package for Morphological Analysis, tailored to morphologically rich and historical languages.☆24Oct 27, 2023Updated 2 years ago
- Obtain Word Alignments using Pretrained Language Models (e.g., mBERT)☆392Nov 7, 2023Updated 2 years ago
- Bicleaner fork that uses neural networks☆40Feb 23, 2026Updated 3 weeks ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …☆82Sep 21, 2023Updated 2 years ago
- ☆11Oct 14, 2023Updated 2 years ago
- 🌸De-inflect Japanese words☆15Nov 24, 2025Updated 3 months ago
- ☆26Jul 30, 2024Updated last year
- Improving cross-lingual word embeddings by meeting in the middle☆23Aug 25, 2020Updated 5 years ago
- Adaptive Machine Translation with Large Language Models☆32Jan 4, 2025Updated last year
- ☆13Apr 19, 2022Updated 3 years ago
- Neural CRF Model for Sentence Alignment in Text Simplification☆68Jan 19, 2025Updated last year
- Automated listing of repos in GitHub with XML files containing teiHeader. Find a project using TEI today!☆17Updated this week
- Improving Low-Resource Neural Machine Translation of Related Languages by Transfer Learning☆19Nov 3, 2022Updated 3 years ago
- An open source Translation Memory Engine written in Java☆16Dec 22, 2022Updated 3 years ago
- Reasoning-based Evaluation and Ranking of Translations.☆20Jul 18, 2025Updated 8 months ago
- Implementation of our paper "Exploiting Unsupervised Data for Emotion Recognition in Conversations" in the Findings of EMNLP-2020.☆13Nov 17, 2020Updated 5 years ago
- Bilingual term extractor☆59Nov 19, 2025Updated 4 months ago
- Re-rank n-best lists using additional features.☆29Jun 5, 2018Updated 7 years ago
- Structure-Invariant Testing for Machine Translation [ICSE'20]☆16Dec 17, 2020Updated 5 years ago
- An Interactive Tool for Annotating Discourse Structure and Text Improvement☆16Sep 15, 2021Updated 4 years ago
- pyTorch-text-classification☆16Dec 6, 2017Updated 8 years ago
- Sanskrit Tibetan Parallel Dataset☆11Jul 2, 2025Updated 8 months ago
- Code for the INTERSPEECH 2023 paper "Learning When to Speak: Latency and Quality Trade-offs for Simultaneous Speech-to-Speech Translation…☆32Jan 14, 2025Updated last year
- German Text Embedding Clustering Benchmark☆18Mar 15, 2024Updated 2 years ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆22May 24, 2023Updated 2 years ago
- AMR-Visualization Tools, show AMR graph strcucture☆12Jul 29, 2019Updated 6 years ago
- Evaluation of Sentence Representations in Polish☆23Dec 29, 2022Updated 3 years ago
- ☆10Jul 15, 2016Updated 9 years ago
- data, metadata, tools, and LDA experiments on a corpus of Sanskrit philosophy texts☆12Nov 28, 2021Updated 4 years ago
- Official repository for U-SAM (Interspeech 2025)☆26Jun 3, 2025Updated 9 months ago
- A software to detect text reuse with BLAST.☆13Oct 8, 2019Updated 6 years ago