qburst / common-crawl-malayalam
Useful tools to extract malayalam text from the Common Crawl Datasets
☆27Updated 4 months ago
Alternatives and similar repositories for common-crawl-malayalam:
Users that are interested in common-crawl-malayalam are comparing it to the libraries listed below
- semantically distinct key phrase extraction using hilbert hashes.☆48Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- ☆28Updated 4 years ago
- Topic Inference with Zeroshot models☆61Updated last year
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Updated 3 years ago
- Tooling to play around with multilingual machine translation for Indian Languages.☆22Updated 3 years ago
- Language detection using Spacy and Fasttext☆55Updated last year
- State of the art open-source translation for Indic languages.☆5Updated 4 years ago
- An ongoing series of notebooks aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks.☆18Updated 4 years ago
- Natural Language Generation for Gramex applications.☆24Updated 2 years ago
- ☆43Updated last year
- ☆11Updated 4 years ago
- ☆13Updated 3 years ago
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- No Teacher BART distillation experiment for NLI tasks☆27Updated 4 years ago
- Build Semantic Search with S-BERT and Fine-tune your model in unsupervised way☆58Updated 2 years ago
- A Directory of Online Newspaper Sources for 70+ Languages☆33Updated 3 years ago
- Shoonya - Platform to Annotate and label data at scale.☆54Updated 7 months ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- NeatText a simple NLP package for cleaning textual data and text preprocessing☆71Updated last year
- ☆11Updated 5 years ago
- ☆57Updated 2 years ago
- Question Generation - Question Answering for Automatic Flashcards☆64Updated 3 years ago
- With embedders, you can easily convert your texts into sentence- or token-level embeddings within a few lines of code. Use cases for this…☆21Updated last year
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆79Updated last year
- ☆12Updated 2 years ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆57Updated 3 months ago
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago
- A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension☆14Updated 2 years ago
- Fast and accurate spell correction library☆81Updated 3 years ago