Generate large textual corpora for almost any language by crawling the web
☆13Feb 17, 2024Updated 2 years ago
Alternatives and similar repositories for webcorpus
Users that are interested in webcorpus are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Apr 28, 2021Updated 4 years ago
- Agile reading group that works☆13Feb 2, 2022Updated 4 years ago
- ☆45Dec 15, 2022Updated 3 years ago
- FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists☆31Aug 14, 2025Updated 7 months ago
- Synthetically generate random text document images with ground-truth☆12Jul 20, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Detection of malicious data exfiltration over DNS using Machine Learning techniques☆13Jul 8, 2020Updated 5 years ago
- Parse Searchable Electoral Rolls☆11Apr 20, 2025Updated 11 months ago
- Official code for "Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Reso…☆18Oct 9, 2025Updated 5 months ago
- Pretraining, fine-tuning and evaluation scripts for Indic-Wav2Vec2☆110Aug 28, 2025Updated 6 months ago
- Integration between Rocket.Chat and the RASA Chatbot platform☆17Jul 31, 2023Updated 2 years ago
- Code for EMNLP 2022 Paper: On the Calibration of Massively Multilingual Language Models☆15Jun 12, 2023Updated 2 years ago
- Code repository for the paper "Improving End-to-End SLU performance with Prosodic Attention and Distillation" accepted at Interspeech 202…☆27May 17, 2023Updated 2 years ago
- Python library for converting numbers to words for all Indian Languages.☆36May 23, 2025Updated 10 months ago
- A simple, consistent and extendable toolkit for IndicTrans2. (Pypi: https://pypi.org/project/indictranstoolkit)☆38Jul 24, 2025Updated 8 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆23May 5, 2022Updated 3 years ago
- Text-to-Speech for languages of India☆345Nov 8, 2024Updated last year
- Face Recognition based attendance system for classroom environment. Developed a python API which recognizes the people in a picture(of a …☆14Dec 8, 2022Updated 3 years ago
- ☆15Apr 26, 2025Updated 10 months ago
- This software is a demonstration of Audio Signal Processing and Machine Learning using Python and Tensorflow. The software contains a GU…☆11Dec 7, 2023Updated 2 years ago
- Analytics on Apache Projects for Diversity☆18Jun 18, 2019Updated 6 years ago
- Text to Speech for Indic languages☆52Mar 23, 2022Updated 4 years ago
- ☆12Feb 6, 2023Updated 3 years ago
- This will hold the data pipeline to convert raw audio data to speech which will act as input dataset for speech-to-text pipeline☆32Feb 15, 2023Updated 3 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆18Jan 15, 2021Updated 5 years ago
- ☆18Mar 4, 2025Updated last year
- Tool to fix bitexts and tag near-duplicates for removal☆34Sep 4, 2025Updated 6 months ago
- Vaksanca introduces free Sanskrit speech corpus with vowel segmentation.☆16Jul 22, 2021Updated 4 years ago
- Hands-on Python 3.x GUI Programming, Published by Packt☆13Jan 18, 2021Updated 5 years ago
- Code for the paper "Modelling Latent Translations for Cross-Lingual Transfer"☆17Nov 22, 2021Updated 4 years ago
- ☆11Mar 19, 2023Updated 3 years ago
- ☆39Feb 8, 2026Updated last month
- An experiment to see if chatgpt can improve the output of the stanford alpaca dataset☆12Mar 29, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆41Jul 14, 2022Updated 3 years ago
- Align, a general text alignment function☆15Dec 7, 2023Updated 2 years ago
- GPT-3 attempts to predict & balance chemical reactions☆13Aug 2, 2020Updated 5 years ago
- Custom Named Entity Recognition with Spacy3☆31Dec 30, 2021Updated 4 years ago
- A Docusaurus theme to add support for MDX v2☆28Jul 20, 2022Updated 3 years ago
- CycloNet is a Deep Learning based web-app for Cyclone intensity computation using INSAT-3D Cyclone Imagery☆13Sep 17, 2023Updated 2 years ago
- DEPRECATED version of SoundFile☆14May 26, 2020Updated 5 years ago