Tools for compiling corpora from Common Crawl
☆14Nov 24, 2024Updated last year
Alternatives and similar repositories for cc_corpus
Users that are interested in cc_corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.☆16Sep 20, 2023Updated 2 years ago
- e-magyar text processing system -- inter-module communication via tsv + REST API☆31Aug 23, 2025Updated 7 months ago
- A python library for easily querying morphological inflection models trained on Unimorph☆13Oct 23, 2022Updated 3 years ago
- Let LLMs play Counter-Strike 1.6☆16May 15, 2025Updated 10 months ago
- Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection☆15Aug 20, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Notes on papers in Natural Language Processing, Computational Linguistics, and the related sciences☆14Mar 11, 2026Updated 2 weeks ago
- A curated list of NLP resources for Hungarian☆276Jan 22, 2026Updated 2 months ago
- Benchmark Large Language Models Reliably On Your Data☆18Dec 27, 2025Updated 2 months ago
- Some useful scripts to run ipptool commands against printers☆12Feb 8, 2017Updated 9 years ago
- Automatically exported from code.google.com/p/hunpos☆12Apr 9, 2018Updated 7 years ago
- Here are all of the PowerPoint presentations that I have ever created and presented.☆12Dec 28, 2020Updated 5 years ago
- A package for handy processing of semantic graphs such as AMR, with a special focus on standardized evaluation☆26May 1, 2025Updated 10 months ago
- Convert an imscc file to a folder with all the content with proper structure☆10Jul 4, 2016Updated 9 years ago
- Python wrapper around the Mac TIS functions to convert between chars and keycodes☆16Dec 1, 2015Updated 10 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- NLP & FM Lecture Slides☆43Mar 19, 2026Updated last week
- A micro service that allows to compile *Tex-files via HTTP☆13Mar 11, 2018Updated 8 years ago
- Use Python to Automate the PowerPoint Update☆15May 28, 2023Updated 2 years ago
- Postgres date column parser☆17Feb 8, 2026Updated last month
- Allows manual adding and editon of Timetracking Entries☆21May 18, 2021Updated 4 years ago
- subdomain list based on Common Crawl data, sorted by popularity☆17Nov 19, 2019Updated 6 years ago
- Speed testing for a data munging task☆47Feb 23, 2013Updated 13 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- Scan using a network scanner with eSCL protocol (e.g. Canon PIXMA)☆16Mar 26, 2020Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Majority of the Large Language Models summarized in a table. From the original Transformer to ChatGPT and beyond.☆14Jan 20, 2023Updated 3 years ago
- Magyar morfológiai generátor☆16Dec 12, 2025Updated 3 months ago
- epsilon is a scanner generator☆29Jun 12, 2022Updated 3 years ago
- Demo for an upcoming blog post☆16Mar 10, 2016Updated 10 years ago
- ☆14Apr 28, 2023Updated 2 years ago
- Multiple style transfer via variational autoencoder☆28Feb 10, 2022Updated 4 years ago
- Javascript porting of curses library using Emscripten☆22Feb 5, 2016Updated 10 years ago
- Extract images from PowerPoint files☆17Dec 1, 2011Updated 14 years ago
- Automated generation of powerpoint slides for fun and profit☆13Oct 18, 2017Updated 8 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated 2 months ago
- GzipReader for reading multiple files☆13May 26, 2015Updated 10 years ago
- ☆13Feb 5, 2020Updated 6 years ago
- Shibboleth Authentication mechanisms (Module PAM, JAAS and Pyhton to authenticate over SAML IdPs)☆32Oct 17, 2018Updated 7 years ago
- Introduction to Python and Natural Language Technologies (ENVIAUAV35) course☆13Jun 24, 2021Updated 4 years ago
- This is a community fork of https://github.com/xdspacelab/openvslam☆11Dec 7, 2022Updated 3 years ago
- A trend viewer written in Python/JavaScript☆21Nov 15, 2024Updated last year