LlmKira / fast-langdetectLinks
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆205Updated 2 months ago
Alternatives and similar repositories for fast-langdetect
Users that are interested in fast-langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆155Updated last year
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆100Updated 8 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆228Updated last year
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆179Updated 9 months ago
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆225Updated 9 months ago
- 🔧 Repair JSON!Solution for JSON Anomalies from LLMs.☆256Updated 10 months ago
- Simple package to extract text with coordinates from programmatic PDFs☆126Updated last month
- FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes☆206Updated 3 weeks ago
- Evaluation for AI apps and agent☆41Updated last year
- Extract structured text from pdfs quickly☆485Updated this week
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆69Updated last year
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆468Updated 3 weeks ago
- Open Source Text Embedding Models with OpenAI Compatible API☆153Updated 10 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddings☆185Updated 9 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- Code for explaining and evaluating late chunking (chunked pooling)☆396Updated 5 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆315Updated 2 months ago
- Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…☆809Updated 6 months ago
- ☆183Updated this week
- Conversational Retrieval Evaluation Dataset☆100Updated 3 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆117Updated last year
- This repository presents the original implementation of LumberChunker: Long-Form Narrative Document Segmentation by André V. Duarte, João…☆68Updated 8 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆136Updated 2 weeks ago
- ☆50Updated last month
- https://no-ocr.com/about☆131Updated 4 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.☆42Updated 10 months ago
- Using APPL to reimplement popular algorithms for Large Language Models (LLMs) and prompts☆45Updated 4 months ago
- Sentence Transformers API: An OpenAI compatible embedding API server☆59Updated 9 months ago
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆112Updated 2 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆99Updated 6 months ago