MarsPanther/crawl-for-parallel-corpora

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MarsPanther/crawl-for-parallel-corpora)

MarsPanther / crawl-for-parallel-corpora

simple bs4 based web crawl for a corpus in need of statistical machine translation

☆13

Alternatives and similar repositories for crawl-for-parallel-corpora

Users that are interested in crawl-for-parallel-corpora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MarsPanther / Amharic-English-Machine-Translation-Corpus
View on GitHub
Amharic English Machine Translation Corpus prepared through website crawelling and custom preprocessing.
☆49Aug 2, 2018Updated 7 years ago
EthioNLP / Ethiopian-Language-Survey
View on GitHub
Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities
☆17Jun 4, 2025Updated last year
maobedkova / AmharicCorpus
View on GitHub
The set of files used for the development of the Amharic Corpus.
☆11Jun 4, 2017Updated 9 years ago
admasethiopia / dictionaries
View on GitHub
Amharic/Tigrinya/Oromo Dictionaries
☆39Updated this week
hltdi / HornMorpho
View on GitHub
Morphological processing for languages of the Horn of Africa
☆61Jun 27, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
AI-Lab-Makerere / Data4Good
View on GitHub
This repository contains publicly available speech and text data in Luganda.
☆12Sep 4, 2020Updated 5 years ago
liulalemx / felig-toolkit
View on GitHub
A toolset for Amharic Language pre-processing. Includes an Amharic Stemmer, Transliterator, Stopword remover , Lexical analyzer, Corpus i…
☆37May 27, 2023Updated 3 years ago
picnicml / doddle-model-examples
View on GitHub
doddle-model code examples
☆19Sep 23, 2019Updated 6 years ago
jonsafari / habeas-corpus
View on GitHub
Command-line corpus tools
☆12May 15, 2017Updated 9 years ago
Eroica / greedy-ocr
View on GitHub
An OCR engine that works by finding pre-known letters in a word's image
☆12Jul 29, 2019Updated 7 years ago
icflorescu / postgresql-tsearch-utils
View on GitHub
A collection of files and patterns to improve PostgreSQL text search
☆11Aug 26, 2016Updated 9 years ago
danigb / music-chord
View on GitHub
Music chords made easy
☆15Oct 21, 2015Updated 10 years ago
voikko / libreoffice-voikko
View on GitHub
Language checker and hyphenator extension for LibreOffice
☆12Jan 27, 2020Updated 6 years ago
kkumarcodes / Shopmost
View on GitHub
Node.js and React, PostgreSQL based eCommerce platform
☆27Mar 29, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
geezorg / data
View on GitHub
Lexical Data of Ge'ez Languages
☆56Sep 14, 2022Updated 3 years ago
eklem / stopword-trainer
View on GitHub
A module for creating stopword lists for any language, based on a set of documents.
☆15Apr 12, 2026Updated 3 months ago
kashimAstro / ofxGoXtreme
View on GitHub
Hacked Video Camera GoXtreme Wifi Control PTP/RTSP/FTP
☆17Apr 5, 2015Updated 11 years ago
shamilcm / m2scorer
View on GitHub
Scorer for grammatical error correction systems.
☆14Feb 24, 2016Updated 10 years ago
farshadjafari / parallel_corpus_generator
View on GitHub
Python application, generating parallel corpus for any language pairs, can be used for training nmt (Neural Machine Translation) systems
☆12Dec 8, 2022Updated 3 years ago
BitMari / varimi
View on GitHub
A platform for agriculture smart contracts based on the NEO blockchain.
☆33Nov 26, 2019Updated 6 years ago
krasing / multilabel-ULMFiT
View on GitHub
Multi-label aviation safety narratives classification
☆15Jan 29, 2023Updated 3 years ago
rasyosef / amharic-news-category-classification
View on GitHub
notebooks to finetune `bert-small-amharic`, `bert-mini-amharic`, and `xlm-roberta-base` models using an Amharic text classification datas…
☆11May 10, 2024Updated 2 years ago
abdiu34567 / Bible-Api
View on GitHub
Oromo, Amharic, English KJV bible API.
☆15Jul 31, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AkankshaShrimal / React-Native-ChatBot
View on GitHub
A simple chatty bot using react-native and Dialogflow
☆10May 25, 2018Updated 8 years ago
FarMcKon / how_to_programmer
View on GitHub
How To Be a Programmer, edited
☆12May 21, 2012Updated 14 years ago
sekmet / vegefoods-colorlib-gatsby-shopify
View on GitHub
🏪 GatsbyJS + Shopify + Netlify CMS Starter + Vegefoods theme by COLORLIB
☆10Jan 11, 2023Updated 3 years ago
ufal / korektor
View on GitHub
Statistical spell- and (occasional) grammar-checker.
☆19Jul 22, 2026Updated last week
ramonpoca / LanguageToolNSServer
View on GitHub
A NSSpellServer that forwards requests to LanguageTool for grammar checking
☆21Jan 12, 2014Updated 12 years ago
ctlong12 / TrafficControllerFuzzyLogic
View on GitHub
The purpose of this project is to address the design and implementation of an intelligent traffic light system based on fuzzy logic techn…
☆24Jan 13, 2020Updated 6 years ago
kunci115 / siPintar
View on GitHub
Indonesian Chatbot built by Multi Layer Perceptron(Neural Network)
☆42May 22, 2022Updated 4 years ago
MontrealCorpusTools / speechcorpustools
View on GitHub
Easier analysis of large speech corpora
☆24Jun 22, 2021Updated 5 years ago
surafelml / Afro-NMT
View on GitHub
LOW-RESOURCE NEURAL MACHINE TRANSLATION: A BENCHMARK FOR FIVE AFRICAN LANGUAGES
☆16Jul 27, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AAUThematic4LT / Parallel-Corpora-for-Ethiopian-Languages
View on GitHub
☆16Dec 11, 2019Updated 6 years ago
analyticsvidhya / wns-analytics-wizard-2018
View on GitHub
Winners solutions for [WNS Analytics Wizard 2018](https://datahack.analyticsvidhya.com/contest/wns-analytics-hackathon-2018/)
☆25Dec 13, 2018Updated 7 years ago
BatsResearch / LexC-Gen
View on GitHub
Generate synthetic labeled data for extremely low-resource languages using bilingual lexicons.
☆20Oct 3, 2024Updated last year
Devilla / influence.io
View on GitHub
Social proof analytics using AI for next generation social media
☆10Aug 16, 2018Updated 7 years ago
SirajulMostafa / demo-pms-v2
View on GitHub
Pharmacy Management System
☆13Aug 5, 2017Updated 8 years ago
FADHLOUN-Y / 1ST-PLACE-South-African-COVID-19-Vulnerability-Map-Hackathon
View on GitHub
This Challenge aims to infer important COVID-19 public health risk factors from outdated data in South Africa
☆20Dec 8, 2022Updated 3 years ago
nogy / jsoundmodem
View on GitHub
This project is aimed at providing an extendable soundmodem backend to various java applications. It is now in its initial phase providin…
☆19Oct 9, 2011Updated 14 years ago