Library for fast text representation and classification.
☆31Jan 9, 2024Updated 2 years ago
Alternatives and similar repositories for fasterText
Users that are interested in fasterText are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Aug 23, 2024Updated last year
- Targetted language identifier, based on FastText and Hunspell.☆38Sep 4, 2025Updated 7 months ago
- Bicleaner fork that uses neural networks☆40Feb 23, 2026Updated last month
- A Python utility for indexing file lines. Best demo honourable mention at ECIR 2024.☆23Nov 9, 2025Updated 5 months ago
- A library for data streaming and augmentation☆21May 5, 2025Updated 11 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆38Apr 17, 2024Updated last year
- A collection of Zsh functions to augment Git☆19Dec 11, 2025Updated 3 months ago
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 6 months ago
- ☆34Nov 22, 2021Updated 4 years ago
- An OpenAI API compatible LLM inference server based on ExLlamaV2.☆25Feb 9, 2024Updated 2 years ago
- Meedan's Open Source Arabic/English Translation Memory☆33Nov 4, 2009Updated 16 years ago
- collaborative web tool to enrich content☆12Nov 13, 2011Updated 14 years ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆160Jun 18, 2024Updated last year
- data related codebase for polyglot project☆19Mar 30, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆38Oct 16, 2025Updated 5 months ago
- ☆138Jan 22, 2026Updated 2 months ago
- A parallel evaluation data set of SAP software documentation with document structure annotation☆14Jul 30, 2025Updated 8 months ago
- COMET for African languages☆11Jan 24, 2025Updated last year
- A simple semi-supervised approach for creating huggingface data script loaders and upload to the hub.☆11Jun 23, 2024Updated last year
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆38Feb 5, 2026Updated 2 months ago
- Jig for the Open-Source IR Replicability Challenge (OSIRRC)☆13Dec 8, 2022Updated 3 years ago
- ☆37Mar 16, 2026Updated 3 weeks ago
- The pipeline for the OSCAR corpus☆176Nov 9, 2025Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- IAI Style Guide☆11Jun 27, 2025Updated 9 months ago
- An Easy Annotation Tool for Natural Language Processing☆11May 17, 2024Updated last year
- Code and experiments for the COLING2020 paper "Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations".☆11Dec 9, 2020Updated 5 years ago
- Summaries and notes on CounterFactual Machine Learning papers☆19Dec 13, 2018Updated 7 years ago
- SPRINT Toolkit helps you evaluate diverse neural sparse models easily using a single click on any IR dataset.☆47Jul 25, 2023Updated 2 years ago
- Code from blog 'Searching by Music: Leveraging Vector Search for Music Information Retrieval'☆16Nov 16, 2023Updated 2 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Jun 12, 2020Updated 5 years ago
- Datalog engine based on DuckDB☆10Mar 8, 2023Updated 3 years ago
- A Workbench for Autograding Retrieve/Generate Systems☆15Jun 30, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [EMNLP 2023 Demo] fabricator - annotating and generating datasets with large language models.☆110May 16, 2024Updated last year
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆58Feb 3, 2026Updated 2 months ago
- ☆39Oct 3, 2022Updated 3 years ago
- WProofreader software development kit (SDK) offers multilingual spelling & grammar check API and JavaScript libraries for rich text edito…☆13Mar 30, 2026Updated last week
- Rule-based Kurdish Transliterator☆10May 3, 2024Updated last year
- Tools for managing datasets for governance and training.☆90Mar 16, 2026Updated 3 weeks ago
- A list of multi-vector retrieval resources☆18May 29, 2024Updated last year