Tools for compiling corpora from Common Crawl
☆14Nov 24, 2024Updated last year
Alternatives and similar repositories for cc_corpus
Users that are interested in cc_corpus are comparing it to the libraries listed below
Sorting:
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆16Sep 20, 2023Updated 2 years ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- e-magyar text processing system -- inter-module communication via tsv + REST API☆31Aug 23, 2025Updated 6 months ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Aug 20, 2021Updated 4 years ago
- Notes on papers in Natural Language Processing, Computational Linguistics, and the related sciences☆14Feb 25, 2026Updated last week
- Ubiflux Vigor ventilation system RS485 Modbus communications with Python☆11Feb 20, 2026Updated 2 weeks ago
- Here are all of the PowerPoint presentations that I have ever created and presented.☆12Dec 28, 2020Updated 5 years ago
- NLP & FM Lecture Slides☆43Feb 27, 2026Updated last week
- Hungarian tokenizer.☆14Mar 15, 2022Updated 3 years ago
- A curated list of NLP resources for Hungarian☆272Jan 22, 2026Updated last month
- Python wrapper around the Mac TIS functions to convert between chars and keycodes☆16Dec 1, 2015Updated 10 years ago
- Combining encoder-based language models☆11Nov 11, 2021Updated 4 years ago
- Convert an imscc file to a folder with all the content with proper structure☆10Jul 4, 2016Updated 9 years ago
- Home Assistant custom component for Pollen Information in Hungary☆15Jul 17, 2024Updated last year
- Use Python to Automate the PowerPoint Update☆15May 28, 2023Updated 2 years ago
- The toolkit called magyarlanc aims at the basic linguistic processing of Hungarian texts. The toolkit consists of only JAVA modules (the…☆14Jun 21, 2016Updated 9 years ago
- MDLText☆12Jul 13, 2017Updated 8 years ago
- Resources from the Question Generation Shared Task & Evaluation Challenge 2010☆12Dec 21, 2010Updated 15 years ago
- This project is the implementation of Li-Roth paper "Learning Question Classifiers" on TREC dataset☆12Mar 7, 2017Updated 8 years ago
- ☆13Aug 6, 2019Updated 6 years ago
- An elegant and simple way to upload iGEM Wikis.☆14Oct 20, 2021Updated 4 years ago
- ACL Rolling Review website☆11Feb 24, 2026Updated last week
- PDF table extraction☆10Dec 14, 2021Updated 4 years ago
- Automatically exported from code.google.com/p/hunpos☆12Apr 9, 2018Updated 7 years ago
- Further developed as SyntaxDot: https://github.com/tensordot/syntaxdot☆13Dec 18, 2020Updated 5 years ago
- Install python dependencies automatically at runtime☆13Feb 16, 2016Updated 10 years ago
- Repository for creating models, vocabulary and other necessities for Dutch in Spacey☆11Dec 15, 2016Updated 9 years ago
- CogCompTime☆11Apr 19, 2022Updated 3 years ago
- Tools for performing hyperparameter search with Scikit-Learn and Dask http://dask-searchcv.readthedocs.io☆11Nov 16, 2017Updated 8 years ago
- Let LLMs play Counter-Strike 1.6☆16May 15, 2025Updated 9 months ago
- GzipReader for reading multiple files☆13May 26, 2015Updated 10 years ago
- Repository for Watsonian Vice County Boundary layers☆11Aug 4, 2023Updated 2 years ago
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated last month
- Contains code and data for reproducing the manuscript, "Satellites can reveal global extent of forced labor in the world’s fishing fleet"☆10Nov 30, 2020Updated 5 years ago
- PYBOSSA command line client☆11Dec 26, 2019Updated 6 years ago
- Python package to compute metrics on an NLU intent parsing pipeline☆13Mar 10, 2020Updated 5 years ago
- Joint multi-task emotion deep neural model for emotion classification in multigenre.☆14May 10, 2024Updated last year
- ☆18Oct 20, 2017Updated 8 years ago
- Implementation of a simple frame identification approach (SimpleFrameId) described in the paper "Out-of-domain FrameNet Semantic Role Lab…☆15Apr 3, 2017Updated 8 years ago