openlanguagedata/seed

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/openlanguagedata/seed)

openlanguagedata / seed

Seed Machine Translation Data

☆34

Alternatives and similar repositories for seed

Users that are interested in seed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

openlanguagedata / flores
View on GitHub
The FLORES+ Machine Translation Benchmark
☆112Nov 12, 2024Updated last year
alirezamshi / small100
View on GitHub
Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…
☆30Feb 8, 2023Updated 3 years ago
Helsinki-NLP / OpusTools
View on GitHub
☆83Jun 24, 2026Updated last month
AUT-NLP / PQuAD
View on GitHub
☆13Mar 2, 2023Updated 3 years ago
Jimin9401 / avocado
View on GitHub
AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain
☆23May 31, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
luismsgomes / mosestokenizer
View on GitHub
☆20Oct 22, 2021Updated 4 years ago
cisnlp / GlotWeb
View on GitHub
[WWW 2026] 🕸 GlotWeb: Web Indexing for Minority Languages
☆17Apr 14, 2026Updated 3 months ago
pln-fing-udelar / jojajovai
View on GitHub
Jojajovai Guarani-Spanish Parallel Corpus
☆20Jul 5, 2022Updated 4 years ago
thino-rma / fts5_mecab
View on GitHub
sqlite3 fts5 mecab
☆23Aug 9, 2019Updated 6 years ago
jjzha / skill-extraction-weak-supervision
View on GitHub
Partial code for "Skill Extraction from Job Postings using Weak Supervision" at RecSysHR 2022.
☆13May 19, 2023Updated 3 years ago
honnibal / py-clearnlp-converter
View on GitHub
A simple Python wrapper for the ClearNLP constituents-to-dependencies converter
☆11Nov 2, 2015Updated 10 years ago
microsoft / Computational-Use-of-Data-Agreement
View on GitHub
Computational Use of Data Agreement - Removing Barriers to Data Innovation
☆21Jun 12, 2023Updated 3 years ago
facebookresearch / stopes
View on GitHub
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…
☆309Updated this week
CLUEbenchmark / SuperCLUE-Code3
View on GitHub
中文原生等级化代码能力测试基准
☆15Apr 11, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Helsinki-NLP / OPUS
View on GitHub
The Open Parallel Corpus
☆89Jul 20, 2026Updated last week
shenxiangzhuang / bleuscore
View on GitHub
BLEU Score in Rust
☆13Updated this week
skywalker023 / thought-tracing
View on GitHub
🚲 Code and benchmark for our COLM 2025 paper - "Thought Tracing: Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models"
☆15Aug 8, 2025Updated 11 months ago
konstantinjdobler / focus
View on GitHub
[EMNLP'23] Official Code for "FOCUS: Effective Embedding Initialization for Monolingual Specialization of Multilingual Models"
☆37Jun 7, 2025Updated last year
casszhao / PruneHall
View on GitHub
Codebase, data and models for hallucination of pruned models
☆16Jan 11, 2025Updated last year
facebookresearch / flores
View on GitHub
Facebook Low Resource (FLoRes) MT Benchmark
☆771Nov 20, 2023Updated 2 years ago
ghchen18 / acl22-sixtp
View on GitHub
Code for ACL 2022 paper 'Towards Making the Most of Cross-Lingual Transfer for Zero-Shot Neural Machine Translation'
☆12Jun 7, 2024Updated 2 years ago
NextWordDev / psychoevals
View on GitHub
Repository for PsychoEvals - a framework for LLM security, psychoanalysis, and moderation.
☆18Apr 16, 2023Updated 3 years ago
argosopentech / argos-train
View on GitHub
Training scripts for Argos Translate
☆158Jun 26, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
untitaker / rust-vobject
View on GitHub
VObject parser and generator for Rust
☆17Apr 1, 2026Updated 3 months ago
drawshield / Blazon-Parser
View on GitHub
A Flex/Bison Parser for Blazonry - A Mediaeval Graphical Description Language
☆14Apr 23, 2021Updated 5 years ago
Narabzad / t3
View on GitHub
☆25May 1, 2026Updated 2 months ago
INK-USC / XCSR
View on GitHub
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"
☆23Oct 26, 2021Updated 4 years ago
LEL-A / GerAlpacaDataCleaned
View on GitHub
German Alpaca Dataset (Cleaned + Translated)
☆26Apr 6, 2023Updated 3 years ago
MicrosoftTranslator / NTREX
View on GitHub
NTREX -- News Test References for MT Evaluation
☆87Jun 5, 2024Updated 2 years ago
UniversalDependencies / UD_Persian-Seraji
View on GitHub
UD_Persian
☆31May 6, 2026Updated 2 months ago
cisnlp / GlotLID
View on GitHub
[EMNLP 2023] 💬 Language Identification with Support for More Than 2000 Labels
☆213Apr 15, 2026Updated 3 months ago
malteos / clp-transfer
View on GitHub
Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning
☆30Jan 25, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
GoFigure-LANL / VisHash
View on GitHub
Visual Hash for matching copies of visually similar images.
☆16Mar 17, 2025Updated last year
bitextor / bicleaner
View on GitHub
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
☆160Jun 18, 2024Updated 2 years ago
hitachi-nlp / FLD
View on GitHub
☆61Dec 6, 2024Updated last year
Lythenas / rust-orgmode
View on GitHub
A parser for org files writter in rust.
☆17Jan 1, 2019Updated 7 years ago
dataiku / PolYamoR
View on GitHub
PolYamoR is the first forward-reverse automated translation system between Python and R
☆16Mar 31, 2017Updated 9 years ago
karim23657 / Persian-tts-coqui
View on GitHub
Persian/Farsi text to speech(TTS) training using coqui tts
☆214Feb 15, 2025Updated last year
dashayushman / neural-language-model
View on GitHub
A tutorial on how to build your own Neural Language Model
☆10Dec 8, 2022Updated 3 years ago