Crawling engine that crawls a set of top-level domains looking for documents in a list of languages
☆11Feb 6, 2024Updated 2 years ago
Alternatives and similar repositories for linguacrawl
Users that are interested in linguacrawl are comparing it to the libraries listed below
Sorting:
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated last month
- Tool for manual evaluation of parallel sentences.☆15Jan 26, 2026Updated last month
- CS224S Course Project☆14Jun 9, 2014Updated 11 years ago
- Morfessor FlatCat☆13Aug 20, 2019Updated 6 years ago
- Automatically harvested multilingual contrastive word sense disambiguation test sets for machine translation☆17Jan 18, 2021Updated 5 years ago
- Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts☆18Mar 15, 2021Updated 4 years ago
- Lexically Constrained Neural Machine Translation with Levenshtein Transformer☆40Jul 14, 2020Updated 5 years ago
- Efficient teacher-student models and scripts to make them☆54Dec 16, 2023Updated 2 years ago
- Exploring implementing a simple tagger using neural network frameworks☆20Oct 24, 2022Updated 3 years ago
- Dockerized NMT frameworks for nmt-wizard☆39Apr 18, 2023Updated 2 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Mar 5, 2022Updated 3 years ago
- ☆22Dec 20, 2019Updated 6 years ago
- Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP) predictions: part-…☆22Jun 11, 2021Updated 4 years ago
- Data collection, alignment and TAUS repository☆23Nov 30, 2017Updated 8 years ago
- Code for the collection and analysis of the MTNT dataset☆56Apr 2, 2019Updated 6 years ago
- ☆34Feb 1, 2026Updated last month
- Examples from my book "Scripting Intelligence: Web 3.0 Information Gathering and Processing"☆45Oct 13, 2025Updated 4 months ago
- ☆32Apr 18, 2021Updated 4 years ago
- A tool for taxonomy construction using Graph Neural Networks (GNN).☆30Feb 11, 2026Updated 3 weeks ago
- A Python interface to PISA☆37Sep 23, 2025Updated 5 months ago
- Matrix tools for building and inspecting latent spaces☆27Aug 19, 2018Updated 7 years ago
- Finite-state script normalization and processing utilities☆46Feb 25, 2026Updated last week
- NanigoNet — Language detector for code-mixed input supporting 150+19 human+programming languages using deep neural networks☆71May 22, 2023Updated 2 years ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 7 months ago
- ☆10Feb 2, 2021Updated 5 years ago
- Tool for sentiment analysis annotation☆13Mar 26, 2025Updated 11 months ago
- mReasoner is a unified computational implementation of the model theory of thinking and reasoning☆13Aug 17, 2023Updated 2 years ago
- ☆34Feb 17, 2021Updated 5 years ago
- Links to data used in Sproat & Jaitly (https://arxiv.org/abs/1611.00068) experiments.☆77Jul 9, 2021Updated 4 years ago
- Seahorse is a dataset for multilingual, multi-faceted summarization evaluation. It consists of 96K summaries with human ratings along 6 q…☆89Feb 27, 2024Updated 2 years ago
- A High-Quality Multilingual Dataset for Structured Documentation Translation☆37May 1, 2025Updated 10 months ago
- Corpus preprocessing☆100Mar 16, 2024Updated last year
- ☆14May 14, 2019Updated 6 years ago
- Crawler based on a modified browser to detect online tracking.☆11Jul 19, 2023Updated 2 years ago
- Code for "Imitation Attacks and Defenses for Black-box Machine Translations Systems"☆35May 1, 2020Updated 5 years ago
- Curated list of awesome datasets for various table understanding tasks☆18Sep 5, 2025Updated 6 months ago
- Fake NEWS detector using LIAR dataset.☆11Aug 19, 2019Updated 6 years ago
- Data and code for Kang et al., EMNLP 2019's paper titled "(Male, Bachelor) and (Female, Ph.D) have different connotations: Parallelly Ann…☆30Mar 17, 2020Updated 5 years ago
- Super simple, zero config options, <2kb declarative tooltip library with no dependencies.☆17Jun 2, 2023Updated 2 years ago