steinst/SentAlign

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/steinst/SentAlign)

steinst / SentAlign

☆38

Alternatives and similar repositories for SentAlign

Users that are interested in SentAlign are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

fyvo / WMT-Biomed-Test
View on GitHub
☆13Aug 23, 2024Updated last year
bitextor / bicleaner-ai
View on GitHub
Bicleaner fork that uses neural networks
☆40Feb 23, 2026Updated 4 months ago
microsoft / factored-segmenter
View on GitHub
Unsupervised factor-based text tokenizer for natural-language processing applications
☆17Jul 24, 2020Updated 5 years ago
Samsung / MT-LLM-NLU
View on GitHub
Repository for code related to "LLM-Based Machine Translation for Expansion of Spoken Language Understanding Systems to New Languages" pu…
☆16Apr 12, 2024Updated 2 years ago
noe-eva / NOAH-Corpus
View on GitHub
NOAH's Corpus: Part-of-Speech Tagging for Swiss German
☆12Jan 6, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
bfsujason / bertalign
View on GitHub
Multilingual sentence alignment using sentence embeddings
☆157May 4, 2026Updated 2 months ago
swiss-ai / parity-aware-bpe
View on GitHub
Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [ACL 2026]
☆19Apr 18, 2026Updated 3 months ago
cartesinus / leyzer
View on GitHub
Multilingual text corpus designed to study multilingual and cross-lingual natural language understanding (NLU) models and the strategies …
☆13Jun 15, 2025Updated last year
thompsonb / vecalign
View on GitHub
Improved Sentence Alignment in Linear Time and Space
☆200Jul 4, 2026Updated 2 weeks ago
zouharvi / pearmut
View on GitHub
Platform for Evaluating and Reviewing of Multilingual Tasks
☆32Updated this week
naist-nlp / mbrs
View on GitHub
A library for minimum Bayes risk (MBR) decoding
☆53Nov 2, 2025Updated 8 months ago
MichalRyszardWojcik / transformer-language-model
View on GitHub
A clean no-jargon mathematical definition of transforrmer language model with a Python implementation that focuses on clarity rather than…
☆11Jul 23, 2022Updated 3 years ago
ictnlp / SiLLM
View on GitHub
SiLLM is a Simultaneous Machine Translation (SiMT) Framework. It utilizes a Large Language model as the translation model and employs a t…
☆18Feb 22, 2024Updated 2 years ago
czcorpus / InterText_editor
View on GitHub
Editor for aligned parallel texts (personal desktop application).
☆20Jan 15, 2026Updated 6 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
amittai / cynical
View on GitHub
Cynical data selection
☆20Jan 16, 2021Updated 5 years ago
wmt-conference / wmt22-news-systems
View on GitHub
☆21Feb 13, 2023Updated 3 years ago
ZurichNLP / multilingual-instruction-tuning
View on GitHub
Code and data for the paper "Turning English-centric LLMs Into Polyglots: How Much Multilinguality Is Needed?"
☆26Jun 3, 2025Updated last year
stephantul / unitoken
View on GitHub
Tokenization across languages. Useful as preprocessing for subword tokenization.
☆19Feb 7, 2023Updated 3 years ago
sheffieldnlp / mlqe-pe
View on GitHub
Multilingual Quality Estimation and Automatic Post-editing Dataset
☆44Mar 24, 2022Updated 4 years ago
google / wmt-mqm-human-evaluation
View on GitHub
☆100Sep 25, 2025Updated 9 months ago
salavi / Clever_Hans_or_N-ToM
View on GitHub
☆12May 6, 2024Updated 2 years ago
DCSaunders / gender-debias
View on GitHub
Adaptation datasets and scripts for the paper "Reducing gender bias in Neural Machine Translation as a domain adaptation problem" (ACL 20…
☆13Mar 18, 2021Updated 5 years ago
gautierdag / tokenizer-bench
View on GitHub
Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"
☆22Feb 14, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
economia / DHondt
View on GitHub
Utility to compute number of mandates based on election results, uting D'Hondt method
☆11Sep 6, 2013Updated 12 years ago
facebookresearch / bitext-lexind
View on GitHub
Bilingual lexicons map words in one language to their translations in another, and are typically induced by learning linear project…
☆19Jun 1, 2021Updated 5 years ago
kbatsuren / wiktra
View on GitHub
Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)
☆37Jun 29, 2025Updated last year
urchade / graph-neural-nets
View on GitHub
Graph neural networks tutorial in pytorch (GCN, GAT, Node2vec, GraphSAge, ClusterGCN, ...)
☆26Jan 20, 2022Updated 4 years ago
neonbadger / DestinationUnknown
View on GitHub
Hackbright Capstone Project
☆11Apr 14, 2016Updated 10 years ago
compling-potsdam / misc-courses
View on GitHub
☆19Apr 22, 2026Updated 2 months ago
henchc / Rediscovering-Text-as-Data
View on GitHub
L&S 88-5 Connector Course to Data 8
☆15Apr 12, 2018Updated 8 years ago
koaning / sentence-models
View on GitHub
A different, but useful, textcat approach.
☆18Jul 15, 2024Updated 2 years ago
AlvaroCavalcante / hand-face-detector
View on GitHub
Hand and Face Detection for Sign Language
☆17Jan 15, 2026Updated 6 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
sravanareddy / rhymediscovery
View on GitHub
Discovery of Rhyme Schemes in Poetry
☆17Nov 22, 2011Updated 14 years ago
N-Almarwani / DCT_Sentence_Embedding
View on GitHub
Efficient-Sentence-Embedding-using-Discrete-Cosine-Transform
☆17Jul 2, 2020Updated 6 years ago
sveinbjornt / ochre
View on GitHub
Use built-in macOS optical character recognition (OCR) via the command line
☆19Nov 17, 2025Updated 8 months ago
zjpbinary / CSCBLI
View on GitHub
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"
☆14May 30, 2021Updated 5 years ago
thammegowda / mtdata
View on GitHub
A tool that locates, downloads, and extracts machine translation corpora
☆165Apr 13, 2026Updated 3 months ago
JuliaText / CoreNLP.jl
View on GitHub
[Deprecated] A simple Julia interface to the Stanford CoreNLP toolkit.
☆18Feb 8, 2020Updated 6 years ago
Helsinki-NLP / MuCoW
View on GitHub
Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation
☆18Jan 18, 2021Updated 5 years ago