Useful tools to extract malayalam text from the Common Crawl Datasets
☆28Dec 11, 2024Updated last year
Alternatives and similar repositories for common-crawl-malayalam
Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below
Sorting:
- Index Common Crawl archives in tabular format☆125Feb 19, 2026Updated 2 weeks ago
- ☆16Dec 31, 2019Updated 6 years ago
- Russian words synonyms and antonyms☆11Dec 7, 2021Updated 4 years ago
- Process Common Crawl data with Python and Spark☆452Jan 20, 2026Updated last month
- ☆10Jun 4, 2020Updated 5 years ago
- Reproduce analyses in Harmony Manuscript☆11Feb 21, 2020Updated 6 years ago
- A notebook for the talk about Modern Pandas☆11Mar 16, 2020Updated 5 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- Simple method used to load configuration variables from different sources.☆10Jun 20, 2018Updated 7 years ago
- Data profiling tools for Big Data☆11Nov 17, 2025Updated 3 months ago
- A Los Angeles Times analysis of helicopter accident rates☆11Dec 21, 2020Updated 5 years ago
- API-First approach to make Machine Learning solution usable☆13Jan 26, 2019Updated 7 years ago
- Official repository for "DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Opinion Text Generation"☆10May 20, 2022Updated 3 years ago
- Analyzing the most strategic words to guess on Wordle, based on letter frequency distributions☆11Feb 20, 2022Updated 4 years ago
- From the medium article about Customer Retention☆11Nov 20, 2019Updated 6 years ago
- generate spot-it cards☆10Jun 13, 2015Updated 10 years ago
- Yet another tool to search through your (exported) ChatGPT conversations☆13Dec 24, 2025Updated 2 months ago
- Internet Article Spell-Checker☆11Jun 5, 2017Updated 8 years ago
- ☆12Aug 15, 2023Updated 2 years ago
- EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and E…☆42Jun 21, 2022Updated 3 years ago
- A hunspell dictionary that supports both enUS (American English) and deDE (German Standard German).☆11Oct 14, 2022Updated 3 years ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- Spellchecker service based on hunspell for 90 languages☆10Oct 26, 2020Updated 5 years ago
- Diving into the data behind signs on Illinois highways that say "957 TRAFFIC DEATHS IN 2012." #peoplenotdata☆16Jul 8, 2021Updated 4 years ago
- ☆13Jun 28, 2015Updated 10 years ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- An example repo that demonstrates how to properly test Python code that interface with Elasticsearch.☆12Aug 26, 2020Updated 5 years ago
- Its about time that python got a console.log☆10Nov 6, 2018Updated 7 years ago
- Alternative to Python's module `cgitb` with template inspired by http://nette.org/ and https://www.djangoproject.com/☆14Mar 4, 2017Updated 9 years ago
- ☆10Nov 23, 2020Updated 5 years ago
- Indian Language Tagger and Chunker (Hindi, Telugu, Tamil, Marathi, Punjabi, Kanada, Malayalam, Urdu, Bengali)☆42Feb 2, 2023Updated 3 years ago
- BachDuet enables a human performer to improvise a duet counterpoint with a computer agent in real time.☆14Aug 8, 2022Updated 3 years ago
- incremental symbol learning for natural language understanding☆10Jun 12, 2023Updated 2 years ago
- ☆10Jun 3, 2019Updated 6 years ago
- An implementation of Compositional Attention: Disentangling Search and Retrieval by MILA☆14Jun 1, 2022Updated 3 years ago
- ☆20Jun 25, 2013Updated 12 years ago
- ☆15Aug 19, 2024Updated last year
- Indian Language Computing Project☆20Oct 6, 2012Updated 13 years ago
- A customised superset image☆11May 17, 2022Updated 3 years ago