DavidNemeskey/cc_corpus

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/DavidNemeskey/cc_corpus)

DavidNemeskey / cc_corpus

Tools for compiling corpora from Common Crawl

☆14

Alternatives and similar repositories for cc_corpus

Users that are interested in cc_corpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nytud / NYTK-NerKor
View on GitHub
The home repository of the NerKor corpus, a Hungarian gold standard named entity annotated corpus containing 1 million tokens.
☆16Sep 20, 2023Updated 2 years ago
nytud / emtsv
View on GitHub
e-magyar text processing system -- inter-module communication via tsv + REST API
☆32Aug 23, 2025Updated 11 months ago
antonisa / unimorph_inflect
View on GitHub
A python library for easily querying morphological inflection models trained on Unimorph
☆13Oct 23, 2022Updated 3 years ago
codogogo / towerparse
View on GitHub
Tower Parse: Low-Resource Dependency Parsing via Hierarchical Source Selection
☆15Aug 20, 2021Updated 4 years ago
makrai / notes
View on GitHub
Notes on papers in Natural Language Processing, Computational Linguistics, and the related sciences
☆14Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
Rigos0 / counter-strike-in-e2b
View on GitHub
Let LLMs play Counter-Strike 1.6
☆16May 15, 2025Updated last year
oroszgy / awesome-hungarian-nlp
View on GitHub
A curated list of NLP resources for Hungarian
☆281Apr 14, 2026Updated 3 months ago
sumukshashidhar / yourbench
View on GitHub
Benchmark Large Language Models Reliably On Your Data
☆18Dec 27, 2025Updated 6 months ago
jrodriguezg / IPPtool
View on GitHub
Some useful scripts to run ipptool commands against printers
☆12Feb 8, 2017Updated 9 years ago
mivoq / hunpos
View on GitHub
Automatically exported from code.google.com/p/hunpos
☆12Apr 9, 2018Updated 8 years ago
MitchMilam / PowerPoints
View on GitHub
Here are all of the PowerPoint presentations that I have ever created and presented.
☆12Dec 28, 2020Updated 5 years ago
ilinguistics / common_crawl_corpus
View on GitHub
Scripts for building a geo-located web corpus using Common Crawl data
☆11Jan 18, 2026Updated 6 months ago
diwakargrandhi / imscc-file-converter
View on GitHub
Convert an imscc file to a folder with all the content with proper structure
☆11Jul 4, 2016Updated 10 years ago
elte-nlp / elte-nlp-course
View on GitHub
NLP & FM Lecture Slides
☆43Jun 25, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
flipz357 / smatchpp
View on GitHub
A package for handy processing of semantic graphs and meaning representations, (e.g. AMR) with a special focus on standardized evaluation
☆27May 1, 2025Updated last year
digilist / docker-latex-microservice
View on GitHub
A micro service that allows to compile *Tex-files via HTTP
☆13Mar 11, 2018Updated 8 years ago
cydalytics / Python_PowerPoint_Automation
View on GitHub
Use Python to Automate the PowerPoint Update
☆15May 28, 2023Updated 3 years ago
bendrucker / postgres-date
View on GitHub
Postgres date column parser
☆18Feb 8, 2026Updated 5 months ago
stinnux / kanboard-Timetrackingeditor
View on GitHub
Allows manual adding and editon of Timetracking Entries
☆21May 18, 2021Updated 5 years ago
carlbordum / common-crawl-subdomains
View on GitHub
subdomain list based on Common Crawl data, sorted by popularity
☆18Nov 19, 2019Updated 6 years ago
maxrousseau / pynoter
View on GitHub
Convert powerpoint (pptx) files into raw text org or LaTeX files
☆15Aug 28, 2018Updated 7 years ago
azygadlo / LLM-catalog
View on GitHub
Majority of the Large Language Models summarized in a table. From the original Transformer to ChatGPT and beyond.
☆14Jan 20, 2023Updated 3 years ago
kno10 / python-scan-eSCL
View on GitHub
Scan using a network scanner with eSCL protocol (e.g. Canon PIXMA)
☆16Mar 26, 2020Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
violapeter / crumb
View on GitHub
Magyar morfológiai generátor
☆16Dec 12, 2025Updated 7 months ago
MichaelPaddon / epsilon
View on GitHub
epsilon is a scanner generator
☆29Jun 12, 2022Updated 4 years ago
dodeeric / omeka-s-docker
View on GitHub
Omeka-S in Docker containers.
☆20Jan 18, 2022Updated 4 years ago
wartortell / Trollette
View on GitHub
Automated generation of powerpoint slides for fun and profit
☆13Oct 18, 2017Updated 8 years ago
eric-guerin / powerpoint-progressbar
View on GitHub
Automation of the creation of a progress bar in powerpoint, and an overview of the sections on each slide
☆13Nov 14, 2017Updated 8 years ago
vigetlabs / canvas-instagram-filters
View on GitHub
Demo for an upcoming blog post
☆15Mar 10, 2016Updated 10 years ago
Holmes-Alan / ST-VAE
View on GitHub
Multiple style transfer via variational autoencoder
☆28Feb 10, 2022Updated 4 years ago
JacksonKearl / solunar
View on GitHub
☆13Apr 28, 2023Updated 3 years ago
petersand / particle-video-lib
View on GitHub
☆14Feb 5, 2020Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bmeaut / python_nlp_2021_spring
View on GitHub
Introduction to Python and Natural Language Technologies (ENVIAUAV35) course
☆13Jun 24, 2021Updated 5 years ago
ericjang / pptx-export-notes
View on GitHub
Exports plaintext speaker notes from Microsoft Powerpoint .pptx files
☆20Feb 28, 2018Updated 8 years ago
NationalLibraryOfNorway / NB-N-gram
View on GitHub
A trend viewer written in Python/JavaScript
☆21Nov 15, 2024Updated last year
Smerity / cs205_ga
View on GitHub
How deep does Google Analytics go? Efficiently tackling Common Crawl using AWS & MapReduce
☆17Feb 5, 2014Updated 12 years ago
manosmaroulis / openvslam-1
View on GitHub
This is a community fork of https://github.com/xdspacelab/openvslam
☆11Dec 7, 2022Updated 3 years ago
epeake / ModifiedKneserNey
View on GitHub
Interpolated Kneser-Ney smoothing with an out-of-vocabulary correction and discount estimated from training data
☆13Dec 11, 2020Updated 5 years ago
shdnx / ELTE-LaTeX-Thesis-Base
View on GitHub
LaTeX alap ELTE-s szakdolgozat írásához.
☆14May 27, 2015Updated 11 years ago