Library for fast text representation and classification.
☆31Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for fasterText
Users that are interested in fasterText are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆76Apr 1, 2025Updated last year
- Targetted language identifier, based on FastText and Hunspell.☆38Sep 4, 2025Updated 9 months ago
- Statistics on multilingual datasets☆17Jul 12, 2022Updated 3 years ago
- Bicleaner fork that uses neural networks☆40Feb 23, 2026Updated 3 months ago
- A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.☆23Nov 9, 2025Updated 7 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A library for data streaming and augmentation☆21May 5, 2025Updated last year
- Efficient teacher-student models and scripts to make them☆57Dec 16, 2023Updated 2 years ago
- Transform TMX to text☆27Nov 23, 2022Updated 3 years ago
- Extracts plain text, language identification and more metadata from WARC records☆23Apr 16, 2026Updated last month
- Meedan's Open Source Arabic/English Translation Memory☆33Nov 4, 2009Updated 16 years ago
- collaborative web tool to enrich content☆11Nov 13, 2011Updated 14 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- data related codebase for polyglot project☆19Mar 30, 2023Updated 3 years ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 7 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆143Apr 8, 2026Updated 2 months ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆15Jul 30, 2025Updated 10 months ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- ☆32Mar 30, 2023Updated 3 years ago
- ☆38Mar 16, 2026Updated 2 months ago
- Jig for the Open-Source IR Replicability Challenge (OSIRRC)☆13Dec 8, 2022Updated 3 years ago
- The pipeline for the OSCAR corpus☆178Nov 9, 2025Updated 7 months ago
- Micro-framework for publishing linked data☆11Aug 1, 2017Updated 8 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆39Feb 5, 2026Updated 4 months ago
- Library and command line utility to do approximate string matching of a source against a bitext index and get matched source and target.☆52Apr 22, 2025Updated last year
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated 2 years ago
- Python binding for the G'MIC Image Processing Framework☆11Nov 14, 2025Updated 6 months ago
- PyTorch implementation of NAACL 2021 paper "Multi-view Subword Regularization"☆26Jun 2, 2021Updated 5 years ago
- ☆16Oct 17, 2024Updated last year
- Efficient Low-Memory Aligner☆147Jan 15, 2025Updated last year
- Code and experiments for the COLING2020 paper "Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations".☆11Dec 9, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆48Jul 25, 2023Updated 2 years ago
- ☆14Apr 18, 2020Updated 6 years ago
- BabelNet (and WordNet) sense embedding trained with Word2Vec and FastText☆10Sep 3, 2019Updated 6 years ago
- Code from blog 'Searching by Music: Leveraging Vector Search for Music Information Retrieval'☆16Nov 16, 2023Updated 2 years ago
- ☆32Dec 29, 2023Updated 2 years ago
- Datalog engine based on DuckDB☆10Mar 8, 2023Updated 3 years ago
- A Workbench for Autograding Retrieve/Generate Systems☆15Jun 30, 2025Updated 11 months ago