LlmKira / fast-langdetect
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆152Updated this week
Alternatives and similar repositories for fast-langdetect:
Users that are interested in fast-langdetect are comparing it to the libraries listed below
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆86Updated 4 months ago
- Speech Diarization for scrum automation☆101Updated last year
- Conversational Retrieval Evaluation Dataset☆94Updated 4 months ago
- TEaR framework for paper "TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement"☆45Updated 5 months ago
- Evaluation for AI apps and agent☆36Updated last year
- A lightweight script for processing HTML page to markdown format with support for code blocks☆78Updated 9 months ago
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆43Updated 2 weeks ago
- DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought☆201Updated last month
- ✨ Split text by languages (e.g. 你喜欢看アニメ吗 -> 你喜欢看 | アニメ | 吗) for NLP tasks (e.g. parse, TTS). Powered by fasttext and langua☆41Updated 2 months ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆221Updated 5 months ago
- The simplest open-source implementation of perplexity.ai☆286Updated this week
- A prompting library☆154Updated 4 months ago
- Repo for NAACL 2025 Paper "Unfolding the Headline: Iterative Self-Questioning for News Retrieval and Timeline Summarization"☆175Updated last week
- 📝 针对文档类图像做内容提取,将文档类图像一比一输出到Word或者Txt中,便于进一步使用或处理。后续计划支持输入PDF/图像,输出对应json格式、Txt格式、Word格式和Markdown格式。☆178Updated 2 months ago
- A Comprehensive Benchmark for Document Parsing and Evaluation☆212Updated 2 weeks ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆161Updated 8 months ago
- Bambo is a new proxy framework. Compared with mainstream frameworks, it is more lightweight and flexible and can handle various load task…☆34Updated last month
- ☆172Updated 5 months ago
- ☆50Updated last month
- A lightweight end-to-end text-to-speech model☆100Updated last month
- ☆367Updated 2 months ago
- Lightweight, performant, deep table extraction☆394Updated last month
- 80x faster and 95% accurate language identification with Fasttext☆145Updated last year
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆134Updated 4 months ago
- ☆60Updated last month
- Extract structured text from pdfs quickly☆393Updated this week
- ☆152Updated 2 months ago
- An enterprise-grade AI retriever designed to streamline AI integration into your applications, ensuring cutting-edge accuracy.☆279Updated this week
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆23Updated 6 months ago
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆177Updated 3 weeks ago