LlmKira / fast-langdetect
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆167Updated this week
Alternatives and similar repositories for fast-langdetect:
Users that are interested in fast-langdetect are comparing it to the libraries listed below
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and budoux☆46Updated last week
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆90Updated 5 months ago
- 80x faster and 95% accurate language identification with Fasttext☆147Updated last year
- Speech Diarization for scrum automation☆101Updated last year
- A Comprehensive Benchmark for Document Parsing and Evaluation☆261Updated last week
- TEaR framework for paper "TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement"☆45Updated 6 months ago
- Evaluation for AI apps and agent☆36Updated last year
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆224Updated 6 months ago
- Conversational Retrieval Evaluation Dataset☆100Updated this week
- ☆50Updated 2 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated 10 months ago
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆41Updated last month
- 🔧 Repair JSON!Solution for JSON Anomalies from LLMs.☆220Updated 7 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆175Updated 9 months ago
- A unified interface for multiple Text-to-Speech (TTS) providers.☆256Updated last month
- LLM steganography with minimum-entropy coupling - Hiding encrypted messages in natural language.☆83Updated 5 months ago
- ☆388Updated 3 months ago
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆56Updated 5 months ago
- Enhancing Translation with RAG-Powered Large Language Models☆76Updated last month
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Updated 7 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆251Updated 2 weeks ago
- BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages☆221Updated last year
- A Python Package to Access World-Class Generative Models☆126Updated 8 months ago
- A prompting library☆155Updated 5 months ago
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought☆208Updated 2 months ago
- Open Source Text Embedding Models with OpenAI Compatible API☆146Updated 7 months ago
- Turn any OCR models into online inference API endpoint 🚀 🌖☆53Updated 2 weeks ago
- A gradio webui for Andrewyng translation-agent☆28Updated 3 months ago
- Open source inference code for Rev's model☆381Updated last month
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆74Updated 3 months ago