transducens/linguacrawl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/transducens/linguacrawl)

transducens / linguacrawl

Crawling engine that crawls a set of top-level domains looking for documents in a list of languages

☆11

Alternatives and similar repositories for linguacrawl

Users that are interested in linguacrawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MrNemo64 / better-inputs
View on GitHub
A Java API to easily get input from users
☆10Jun 14, 2024Updated 2 years ago
ilinguistics / common_crawl_corpus
View on GitHub
Scripts for building a geo-located web corpus using Common Crawl data
☆11Jan 18, 2026Updated 6 months ago
paracrawl / keops
View on GitHub
Tool for manual evaluation of parallel sentences.
☆15Jan 26, 2026Updated 5 months ago
zifeishan / cs224s-deepSpeech
View on GitHub
CS224S Course Project
☆14Jun 9, 2014Updated 12 years ago
jerinphilip / ilmulti
View on GitHub
Tooling to play around with multilingual machine translation for Indian Languages.
☆22Mar 5, 2022Updated 4 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ymoslem / MT-LM
View on GitHub
Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper
☆18Aug 19, 2023Updated 2 years ago
browsermt / students
View on GitHub
Efficient teacher-student models and scripts to make them
☆57Dec 16, 2023Updated 2 years ago
OpenNMT / nmt-wizard-docker
View on GitHub
Dockerized NMT frameworks for nmt-wizard
☆39Apr 18, 2023Updated 3 years ago
aalto-speech / flatcat
View on GitHub
Morfessor FlatCat
☆13Aug 20, 2019Updated 6 years ago
drfinkus / gpt-2-simple
View on GitHub
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
☆18Mar 15, 2021Updated 5 years ago
modernmt / DataCollection
View on GitHub
Data collection, alignment and TAUS repository
☆24Nov 30, 2017Updated 8 years ago
raymondhs / constrained-levt
View on GitHub
Lexically Constrained Neural Machine Translation with Levenshtein Transformer
☆40Jul 14, 2020Updated 6 years ago
Helsinki-NLP / MuCoW
View on GitHub
Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation
☆18Jan 18, 2021Updated 5 years ago
transducens / LASERtrain
View on GitHub
☆22Dec 20, 2019Updated 6 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nutcrtnk / DHGNet
View on GitHub
Code for paper "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph", EMNLP 2021 - findings.
☆13Dec 14, 2021Updated 4 years ago
jkkummerfeld / neural-tagger-tutorial
View on GitHub
Exploring implementing a simple tagger using neural network frameworks
☆20Oct 24, 2022Updated 3 years ago
browsermt / marian-dev
View on GitHub
Fast Neural Machine Translation in C++ - development repository
☆23May 12, 2024Updated 2 years ago
sachink1729 / SQL-Agents-Using-RAG-DSPy-Groq
View on GitHub
Exploring advanced prompting tools to query SQL database with multiple tables in natural language using LLMs
☆16Aug 23, 2024Updated last year
Kawaeee / butt_or_bread
View on GitHub
Corgi butt or loaf of bread classifier (PyTorch + Streamlit)
☆12Jun 11, 2026Updated last month
GoogleCloudPlatform / bq-mirroring-cdc
View on GitHub
☆13Oct 12, 2020Updated 5 years ago
TechWiz-3 / who-unfollowed-me
View on GitHub
😡 Python CLI tool that shows you who has unfollowed you on GitHub. PRs welcome!
☆11Dec 1, 2022Updated 3 years ago
google-research / nisaba
View on GitHub
Finite-state script normalization and processing utilities
☆52Jun 24, 2026Updated last month
pmichel31415 / mtnt
View on GitHub
Code for the collection and analysis of the MTNT dataset
☆56Apr 2, 2019Updated 7 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
THUlawtech / LEEC
View on GitHub
☆15Jul 25, 2025Updated 11 months ago
itsgorain / 100daysofnetworks
View on GitHub
2023 edition of #100daysofnetworks
☆21Updated this week
lucidrains / nim-tokenizer
View on GitHub
Implementation of a simple BPE tokenizer, but in Nim
☆22Jul 2, 2023Updated 3 years ago
harish-kamath / rqae
View on GitHub
Residual Quantization Autoencoder, used for interpreting LLMs
☆14Jan 1, 2025Updated last year
binbard / ems
View on GitHub
A neat GUI based Employee Management System in python supported by csv, backed by tkinter
☆10Sep 30, 2023Updated 2 years ago
fpdetective / modCrawler
View on GitHub
Crawler based on a modified browser to detect online tracking.
☆11Jul 19, 2023Updated 3 years ago
yobibyte / iclr-viewer
View on GitHub
Go through the list of accepted papers for ICLR in terminal and add them to your reading list.
☆13Jan 30, 2021Updated 5 years ago
BayesForDays / nontology
View on GitHub
Matrix tools for building and inspecting latent spaces
☆26Aug 19, 2018Updated 7 years ago
hplt-project / OpusCleaner
View on GitHub
OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.
☆58Feb 3, 2026Updated 5 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
sglebs / kibana-software-metrics
View on GitHub
Utilities to gather software metrics from tools (SONAR, etc) and store them into ElasticSearch for later display using Kibana.
☆11Dec 31, 2017Updated 8 years ago
iloha-openlab / omegat-textra-plugin
View on GitHub
OmegaT plugin to use TexTra(R) powered by NICT
☆28Jul 16, 2026Updated last week
jdvala / lazytext
View on GitHub
LazyText is inspired by the idea of lazypredict, a library which helps build lot of basic models without much code. LazyText is for text …
☆18Feb 19, 2022Updated 4 years ago
mhagiwara / nanigonet
View on GitHub
NanigoNet — Language detector for code-mixed input supporting 150+19 human+programming languages using deep neural networks
☆71May 22, 2023Updated 3 years ago
222464 / MiniNeoRL
View on GitHub
Simple, small, fully-connected Python version of NeoRL
☆11Jan 29, 2016Updated 10 years ago
deep-spin / OpenNMT-entmax
View on GitHub
☆15May 14, 2019Updated 7 years ago
bitextor / bitextor
View on GitHub
Bitextor generates translation memories from multilingual websites
☆299Nov 11, 2024Updated last year