qburst/common-crawl-malayalam

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qburst/common-crawl-malayalam)

qburst / common-crawl-malayalam

Useful tools to extract malayalam text from the Common Crawl Datasets

☆28

Alternatives and similar repositories for common-crawl-malayalam

Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

commoncrawl / cc-pyspark
View on GitHub
Process Common Crawl data with Python and Spark
☆457Mar 26, 2026Updated 3 months ago
mrcabbage972 / simple-toolformer
View on GitHub
A Python implementation of Toolformer using Huggingface Transformers
☆14Mar 20, 2023Updated 3 years ago
commoncrawl / cc-citations
View on GitHub
Scientific articles using or citing Common Crawl data
☆29Jul 8, 2026Updated last week
juandes / tensorflow-go-models
View on GitHub
A collection of models for TensorFlow Go
☆12May 29, 2022Updated 4 years ago
SachinKalsi / html_tag_annotator
View on GitHub
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension
☆14Feb 11, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
santhoshtr / nupuram
View on GitHub
Nupuram/നൂപുരം Font - https://smc.org.in/fonts/nupuram
☆15Sep 29, 2023Updated 2 years ago
aws-samples / aws-cdk-deep-learning-image-vector-embeddings-at-scale-using-aws-batch
View on GitHub
AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).
☆21Oct 28, 2022Updated 3 years ago
khle08 / epidemix
View on GitHub
Ordinary differential equation solver & network simulator.
☆12Feb 28, 2023Updated 3 years ago
CPFL / robosense
View on GitHub
ROS driver for RS-LiDAR-16 and RS-LiDAR-32
☆11Mar 25, 2019Updated 7 years ago
ikreymer / cc-index-server
View on GitHub
Deployment of pywb as a CommonCrawl Index Server
☆22Oct 6, 2017Updated 8 years ago
arontier / A_Prot_Paper
View on GitHub
A Prot paper related materials
☆11Sep 5, 2022Updated 3 years ago
ecohealthalliance / EpiTator
View on GitHub
EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and E…
☆43Jun 21, 2022Updated 4 years ago
bracesproul / dramatron-template
View on GitHub
☆21Mar 12, 2024Updated 2 years ago
Nyceane / clean_water_ai
View on GitHub
Clean Water AI
☆13Aug 15, 2018Updated 7 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
milosgajdos / embeviz
View on GitHub
A simple app for visualising text embeddings
☆23Jul 20, 2025Updated last year
Engineero / honeybee
View on GitHub
An artificial bee colony implementation in Python
☆11Oct 7, 2020Updated 5 years ago
AashiDutt / RAG
View on GitHub
This repo contains self made projects and learnables from various resources on using local LLMs and RAG
☆14May 26, 2025Updated last year
smc / swanalekha
View on GitHub
Swanalekha input method
☆22Jun 18, 2024Updated 2 years ago
LarremoreLab / SpringRank
View on GitHub
☆11Aug 20, 2025Updated 11 months ago
MichaelWehar / Public-Domain-Word-Lists
View on GitHub
A collection of plain text or csv formatted public domain word lists.
☆21Jun 10, 2017Updated 9 years ago
jeremy886 / crossword_helmig
View on GitHub
from helmig http://bryanhelmig.com/python-crossword-puzzle-generator/
☆16Nov 23, 2013Updated 12 years ago
deep-reinforcement-learning-book / Chapter13-Learning-to-Run
View on GitHub
Chapter 13 Learning to Run in book Deep Reinforcement Learning: code example of solving NIPS 2017: Learning to Run challenge with paralle…
☆13Jul 4, 2021Updated 5 years ago
themaximalist / embeddings.js
View on GitHub
Simple text embeddings library for Node.js (OpenAI, Mistral, Local)
☆31Jun 30, 2025Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Makhber / makhber
View on GitHub
Makhber is a free application for Visualization and Analysis of Scientific Data
☆21Jun 18, 2025Updated last year
jwmeindertsma / Social-Force-Model-Crowd-Simulation
View on GitHub
Crowd Simulation using Social Force Model
☆14May 14, 2020Updated 6 years ago
gioannides / SRNN-Brain-Modelling-Toolbox
View on GitHub
Spatiotemporal Dynamics in Recurrent Spiking Neural Networks using Optimization-based Modelling for EEG signals
☆16Mar 31, 2022Updated 4 years ago
foliant-docs / swagger2markdown
View on GitHub
Converter from Swagger JSON to Markdown
☆12May 11, 2019Updated 7 years ago
statOmics / tradeSeqPaper
View on GitHub
Scripts to reproduce analyses of tradeSeq paper.
☆16Feb 5, 2020Updated 6 years ago
ewmstaley / diffusion_policies
View on GitHub
Implementation of Diffusion Policy
☆14Dec 13, 2024Updated last year
shriaithal / Cloudbread
View on GitHub
Food Waste Management with Predictive Analysis to Restuarants
☆17Oct 17, 2018Updated 7 years ago
ternaus / clip2onnx
View on GitHub
Converts CLIP models to ONNX
☆11Jan 17, 2023Updated 3 years ago
lhqing / mouse_brain_2020
View on GitHub
Jupyter notebook repository for reproducing analysis in "DNA Methylation Landscape of the Mouse Brain at Single-Cell Resolution"
☆14Apr 1, 2020Updated 6 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
theislab / scanpy-in-R
View on GitHub
A guide to using the Python scRNA-seq analysis package Scanpy from R
☆15Apr 27, 2020Updated 6 years ago
huwenboshi / s-ldxr
View on GitHub
Stratified squared trans-ethnic genetic correlation
☆14May 12, 2022Updated 4 years ago
prasoongoyal / PixL2R
View on GitHub
☆17Dec 21, 2020Updated 5 years ago
abhijithneilabraham / Covid-QA
View on GitHub
covid question answering datasets and fine tuned models
☆18Apr 27, 2021Updated 5 years ago
GP2code / GenoTools
View on GitHub
A suite of tools for processing genotype data. Includes calling genotypes from .idat to plink (ped), sample/case-control variant QC steps…
☆15Feb 5, 2026Updated 5 months ago
codingforentrepreneurs / Smarter-Web-Scraping-with-Python
View on GitHub
Leverage modern open-source tools to create better web scraping workflows.
☆31Feb 29, 2024Updated 2 years ago
J-Que / RL-GA
View on GitHub
Genetic algorithm tuned through reinforcement learning
☆17Jul 2, 2021Updated 5 years ago