Data Collection System For NLP/Speech Recognition
☆25Apr 20, 2021Updated 4 years ago
Alternatives and similar repositories for Babler
Users that are interested in Babler are comparing it to the libraries listed below
Sorting:
- Brave is a simple visualisation library for NLP information extraction, built on top of embedded BRAT.☆15Dec 25, 2019Updated 6 years ago
- Yet Another (natural language) Parser☆43May 15, 2019Updated 6 years ago
- 🕸 GlotWeb: Web Indexing for Minority Languages (WWW 2026)☆17Updated this week
- Hebrew Universal Dependencies Treebank☆14Nov 12, 2025Updated 3 months ago
- Python wrapper for ONLP YAP https://github.com/OnlpLab/yap☆16Jan 27, 2023Updated 3 years ago
- 🖋 Resource and Tool for Writing System Identification (Unicode 17.0) -- LREC 2024☆21Feb 17, 2026Updated 2 weeks ago
- ☆17Mar 28, 2017Updated 8 years ago
- Python Speex☆23Aug 10, 2017Updated 8 years ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆35Feb 5, 2026Updated 3 weeks ago
- ☆10Jun 24, 2020Updated 5 years ago
- Gazetteer of the Ancient Near East Data☆10Aug 1, 2013Updated 12 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆41Apr 5, 2022Updated 3 years ago
- Creating super-parallel corpora of more than 1500+ unique languages for NLP research☆34Dec 8, 2022Updated 3 years ago
- The Luhn algorithm is a simple checksum formula used to validate a variety of identification numbers, such as credit card numbers, IMEI n…☆10Dec 4, 2017Updated 8 years ago
- An sbt plugin for adding sounds to task completions☆28May 5, 2018Updated 7 years ago
- ☆10May 5, 2017Updated 8 years ago
- scripts to align a given wave to its transcription using trained models by Kaldi☆36Aug 15, 2019Updated 6 years ago
- Crawler based on a modified browser to detect online tracking.☆11Jul 19, 2023Updated 2 years ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- ☆10Feb 21, 2020Updated 6 years ago
- ☆12Dec 8, 2022Updated 3 years ago
- Utilities to gather software metrics from tools (SONAR, etc) and store them into ElasticSearch for later display using Kibana.☆11Dec 31, 2017Updated 8 years ago
- A Ruby gem that calculates Gematria☆10Mar 15, 2013Updated 12 years ago
- Fast Double Metaphone in C++11☆21Aug 26, 2014Updated 11 years ago
- ☆12Nov 9, 2018Updated 7 years ago
- The grobidmonkey package is an open-source package designed for postprocessing GROBID outputs.☆12Mar 27, 2024Updated last year
- 粵文語料篩選器 Cantonese text filter☆41Feb 4, 2026Updated last month
- Persian Datasets including: Wikipedia, Twitter, Hamshahri, Hellokish, NSURL'19, Peyma, Text_mining.ir☆11Oct 6, 2023Updated 2 years ago
- Incredible user-friendly seq2seq API and CLI app with beam search, bidirectional, attention, bucket in just one single file☆12Sep 16, 2018Updated 7 years ago
- This is the home directory to speaker diarization module being developed for Hetergeneous News data in RedHen Labs as a GSOC Project☆10Sep 11, 2015Updated 10 years ago
- KnowMAN: Weakly Supervised Multinomial Adversarial Networks☆12Nov 9, 2021Updated 4 years ago
- No-nonsense simple transliteration between writing systems, mostly of Semitic origin☆13Jun 29, 2025Updated 8 months ago
- An eXample Programming Language☆11Dec 20, 2018Updated 7 years ago
- Unbounded cache model for online language modeling with open vocabulary☆11Feb 15, 2019Updated 7 years ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- Human Activity Segmentation Challenge 2023 @ ECML/PKDD☆12Nov 2, 2023Updated 2 years ago
- a fast implementation of BM25☆10Sep 15, 2022Updated 3 years ago
- Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.☆10Aug 13, 2023Updated 2 years ago
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 4 months ago