codegram / calbert
Catalan ALBERT (A Lite BERT for self-supervised learning of language representations)
☆14Updated 4 years ago
Alternatives and similar repositories for calbert:
Users that are interested in calbert are comparing it to the libraries listed below
- Pre-production releases for Spacy in Catalan☆14Updated 3 years ago
- The RadioTalk dataset of talk radio transcripts☆59Updated 4 years ago
- ☆17Updated 8 months ago
- Official source for Catalan Language Models and resources made within Aina project.☆24Updated last year
- Forced Alignments for Common Voice☆31Updated 4 years ago
- An asynchronous concurrent pipeline for classifying Common Crawl based on fastText's pipeline.☆86Updated 4 years ago
- A raspberry pi 64bit image with spacy and neuralcoref pre-installed☆21Updated 5 years ago
- A web interface to understand language-specific BERT-models☆17Updated last year
- Experiments with Hugging Face 🔬 🤗☆44Updated 8 months ago
- fastlangid, the only language identification package that support cantonese (zh-yue), simplified (zh-hans) and traditional chinese (zh-ha…☆39Updated 2 years ago
- A simple neural truecaser written in pytorch and allennlp.☆33Updated 10 months ago
- Gentle and praatio scripts for easy forced alignment☆18Updated 2 years ago
- Open Source AI Benchmarking toolkit for benchmarking speech to text services☆55Updated last year
- Running Mozilla's implementation of Baidu DeepSpeech on Google Colaboratory☆16Updated 6 years ago
- spaCy match and replace, maintaining conjugation☆35Updated 2 years ago
- ☆30Updated 2 years ago
- dataset of podcasts and episodes☆14Updated 7 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 4 years ago
- Visualize large text collections with WebGL☆25Updated 7 months ago
- ☆75Updated 3 years ago
- docker for HF wav2vec2-sprint☆13Updated 4 years ago
- Markdown template for Dataseets for Datasets☆62Updated 2 years ago
- Morfessor EM+Prune☆10Updated 4 years ago
- BERT models for many languages created from Wikipedia texts☆33Updated 4 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 2 years ago
- Experiments with generating GPT-2 fanfiction on specified topics.☆11Updated 5 years ago
- Gamma Agreement in Python☆43Updated last year
- MaSS - Multilingual corpus of Sentence-aligned Spoken utterances☆49Updated 7 months ago
- Using YouTube to prepare a speech recognition dataset for any language☆10Updated 4 years ago
- Infrastructure useful to create natural language processing systems based on transformer networks☆11Updated 5 years ago