LlmKira / fast-langdetect
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆196Updated last month
Alternatives and similar repositories for fast-langdetect
Users that are interested in fast-langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆153Updated last year
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux☆54Updated 2 months ago
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆100Updated 7 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆221Updated 11 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆114Updated last year
- Extract structured text from pdfs quickly☆475Updated 2 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆302Updated last month
- Simple package to extract text with coordinates from programmatic PDFs☆122Updated last month
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆225Updated 8 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆792Updated 5 months ago
- 🔧 Repair JSON!Solution for JSON Anomalies from LLMs.☆244Updated 10 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddings☆185Updated 8 months ago
- Lightweight, performant, deep table extraction☆459Updated 2 weeks ago
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆44Updated 4 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆424Updated last month
- TF-ID: Table/Figure IDentifier for academic papers☆232Updated 10 months ago
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆204Updated last week
- Open source inference code for Rev's model☆401Updated 3 weeks ago
- ☆222Updated 5 months ago
- A python library to define and validate data types in Docling.☆134Updated this week
- Evaluation for AI apps and agent☆41Updated last year
- ☆171Updated 9 months ago
- https://no-ocr.com/about☆128Updated 3 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆93Updated 6 months ago
- Analysis of Chinese and English layouts 中英文版面分析☆208Updated last month
- Speech Diarization for scrum automation☆104Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆238Updated 5 months ago
- ☆50Updated last month
- ☆180Updated last month