LlmKira / fast-langdetectLinks
⚡️ 80x faster Fasttext language detection out of the box | Split text by language
☆218Updated 3 months ago
Alternatives and similar repositories for fast-langdetect
Users that are interested in fast-langdetect are comparing it to the libraries listed below
Sorting:
- 80x faster and 95% accurate language identification with Fasttext☆158Updated last year
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆102Updated 9 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆257Updated last month
- [ACL 2024] This is the code repo for our ACL’24 paper "Cleaner Pretraining Corpus Curation with Neural Web Scraping".☆226Updated 10 months ago
- Evaluation for AI apps and agent☆42Updated last year
- 🔧 Repair JSON!Solution for JSON Anomalies from LLMs.☆268Updated last month
- Open Source Text Embedding Models with OpenAI Compatible API☆155Updated last year
- Speech Diarization for scrum automation☆108Updated last year
- Unattended Lightweight Text Classifiers with LLM Embeddings☆185Updated 10 months ago
- Extract structured text from pdfs quickly☆512Updated last month
- TF-ID: Table/Figure IDentifier for academic papers☆238Updated last year
- Sentence Transformers API: An OpenAI compatible embedding API server☆63Updated 10 months ago
- Python Implementation of MUVERA (Multi-Vector Retrieval via Fixed Dimensional Encodings)☆185Updated last week
- A lightweight script for processing HTML page to markdown format with support for code blocks☆79Updated last year
- Enable tool-use ability for any LLM model (DeepSeek V3/R1, etc.)☆52Updated last month
- ☆187Updated 2 weeks ago
- Conversational Retrieval Evaluation Dataset☆101Updated 4 months ago
- A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.☆343Updated last month
- Turn any OCR models into online inference API endpoint 🚀 🌖☆56Updated 3 months ago
- This code sets up a simple yet robust server using FastAPI for handling asynchronous requests for embedding generation and reranking task…☆69Updated last year
- Deep Reasoning Translation (DRT) Project☆226Updated last month
- Code for explaining and evaluating late chunking (chunked pooling)☆419Updated 6 months ago
- Whisper realtime streaming for long speech-to-text transcription and translation☆120Updated last year
- Formatron empowers everyone to control the format of language models' output with minimal overhead.☆217Updated last month
- ☆477Updated 4 months ago
- Benchmarking PDF libraries☆296Updated 2 weeks ago
- A very simple news crawler with a funny name☆390Updated this week
- Scrape the webpage convert it into Markdown, and enhance AI search applications.☆251Updated last year
- A simple tool that let's you explore different possible paths that an LLM might sample.☆175Updated 2 months ago
- xllamacpp - a Python wrapper of llama.cpp☆45Updated this week