Crawling engine that crawls a set of top-level domains looking for documents in a list of languages
☆11Feb 6, 2024Updated 2 years ago
Alternatives and similar repositories for linguacrawl
Users that are interested in linguacrawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Java API to easily get input from users☆10Jun 14, 2024Updated last year
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated 2 months ago
- Tool for manual evaluation of parallel sentences.☆15Jan 26, 2026Updated last month
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Mar 5, 2022Updated 4 years ago
- CS224S Course Project☆14Jun 9, 2014Updated 11 years ago
- Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper☆18Aug 19, 2023Updated 2 years ago
- Efficient teacher-student models and scripts to make them☆55Dec 16, 2023Updated 2 years ago
- Exploring advanced prompting tools to query SQL database with multiple tables in natural language using LLMs☆16Aug 23, 2024Updated last year
- Dockerized NMT frameworks for nmt-wizard☆39Apr 18, 2023Updated 2 years ago
- Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts☆18Mar 15, 2021Updated 5 years ago
- Morfessor FlatCat☆13Aug 20, 2019Updated 6 years ago
- Data collection, alignment and TAUS repository☆23Nov 30, 2017Updated 8 years ago
- Lexically Constrained Neural Machine Translation with Levenshtein Transformer☆40Jul 14, 2020Updated 5 years ago
- ☆13Jul 25, 2025Updated 8 months ago
- ☆22Dec 20, 2019Updated 6 years ago
- Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation☆17Jan 18, 2021Updated 5 years ago
- Exploring implementing a simple tagger using neural network frameworks☆20Oct 24, 2022Updated 3 years ago
- Fast Neural Machine Translation in C++ - development repository☆23May 12, 2024Updated last year
- Code for paper "Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph", EMNLP 2021 - findings.☆13Dec 14, 2021Updated 4 years ago
- Corgi butt or loaf of bread classifier (PyTorch + Streamlit)☆11Updated this week
- ☆13Oct 12, 2020Updated 5 years ago
- A lightweight library with a wide range of utilities and tools for faster and more efficient plugin development.☆43Mar 5, 2026Updated 2 weeks ago
- Code for the collection and analysis of the MTNT dataset☆56Apr 2, 2019Updated 6 years ago
- 2023 edition of #100daysofnetworks☆22Updated this week
- Possibly the best kana training page there is, with various practice modes☆26Nov 2, 2025Updated 4 months ago
- Finite-state script normalization and processing utilities☆46Mar 9, 2026Updated 2 weeks ago
- Crawler based on a modified browser to detect online tracking.☆11Jul 19, 2023Updated 2 years ago
- Go through the list of accepted papers for ICLR in terminal and add them to your reading list.☆13Jan 30, 2021Updated 5 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- 😡 Python CLI tool that shows you who has unfollowed you on GitHub. PRs welcome!☆11Dec 1, 2022Updated 3 years ago
- Matrix tools for building and inspecting latent spaces☆27Aug 19, 2018Updated 7 years ago
- Utilities to gather software metrics from tools (SONAR, etc) and store them into ElasticSearch for later display using Kibana.☆11Dec 31, 2017Updated 8 years ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆58Feb 3, 2026Updated last month
- OmegaT plugin to use TexTra(R) powered by NICT☆28Mar 10, 2026Updated 2 weeks ago
- LazyText is inspired by the idea of lazypredict, a library which helps build lot of basic models without much code. LazyText is for text …☆18Feb 19, 2022Updated 4 years ago
- NanigoNet — Language detector for code-mixed input supporting 150+19 human+programming languages using deep neural networks☆71May 22, 2023Updated 2 years ago
- Bitextor generates translation memories from multilingual websites☆301Nov 11, 2024Updated last year
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- ☆14May 14, 2019Updated 6 years ago