Library for fast text representation and classification.
☆31Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for fasterText
Users that are interested in fasterText are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆76Apr 1, 2025Updated last year
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Bicleaner fork that uses neural networks☆40Feb 23, 2026Updated 2 months ago
- A library for data streaming and augmentation☆21May 5, 2025Updated last year
- ☆39Apr 17, 2024Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A collection of Zsh functions to augment Git☆19Dec 11, 2025Updated 5 months ago
- Extracts plain text, language identification and more metadata from WARC records☆23Apr 16, 2026Updated last month
- ☆34Nov 22, 2021Updated 4 years ago
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Feb 9, 2024Updated 2 years ago
- Proceedings of the annual intercalary robot dance party in celebration of workshop on symposium about 2^6th birthdays; in particular, tha…☆20May 10, 2026Updated last week
- ☆29Feb 11, 2026Updated 3 months ago
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- collaborative web tool to enrich content☆12Nov 13, 2011Updated 14 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- data related codebase for polyglot project☆19Mar 30, 2023Updated 3 years ago
- Fast Neural Machine Translation in C++ - development repository☆23May 12, 2024Updated 2 years ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 7 months ago
- ☆142Apr 8, 2026Updated last month
- A polite and user-friendly downloader for Common Crawl data☆80May 4, 2026Updated 2 weeks ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆15Jul 30, 2025Updated 9 months ago
- Programs written in BASIC☆16Mar 20, 2021Updated 5 years ago
- Probe TemporaryExposureKeys and Files of Exposure Notifications System in Japan a.k.a. "COCOA".☆10Sep 14, 2021Updated 4 years ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- ☆37Mar 16, 2026Updated 2 months ago
- Jig for the Open-Source IR Replicability Challenge (OSIRRC)☆13Dec 8, 2022Updated 3 years ago
- The pipeline for the OSCAR corpus☆177Nov 9, 2025Updated 6 months ago
- Micro-framework for publishing linked data☆11Aug 1, 2017Updated 8 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆39Feb 5, 2026Updated 3 months ago
- IAI Style Guide☆11Jun 27, 2025Updated 10 months ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆52Apr 22, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Medusa combo files, Hashcat rules and dictionaries, JRT rules☆14Oct 20, 2022Updated 3 years ago
- シノビガミセッションサポートbot☆12Dec 19, 2025Updated 5 months ago
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated 2 years ago
- Python binding for the G'MIC Image Processing Framework☆11Nov 14, 2025Updated 6 months ago
- ☆16Oct 17, 2024Updated last year
- Formulaire en ligne qui génère une attestation de déplacement dérogatoire☆10Mar 18, 2020Updated 6 years ago
- Efficient Low-Memory Aligner☆148Jan 15, 2025Updated last year