isLinXu / regex-tokenizer
Converted the Jina Tokenizer regex pattern to python.
☆26Updated 6 months ago
Alternatives and similar repositories for regex-tokenizer:
Users that are interested in regex-tokenizer are comparing it to the libraries listed below
- 中 文原生检索增强生成测评基准☆111Updated 10 months ago
- 顾名思义:手搓的RAG☆120Updated last year
- A Python Package to Access World-Class Generative Models☆126Updated 8 months ago
- unify-easy-llm(ULM)旨在打造一个简易的一键式大模型训练工具,支持Nvidia GPU、Ascend NPU等不同硬件以及常用的大模型。☆55Updated 7 months ago
- 利用免费的大模型api来结合你的私域数据来生成sft训练数据(妥妥白嫖)支持llamafactory等工具的训练数据格式synthetic data☆139Updated 3 months ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆74Updated 3 months ago
- The Level-Navi Agent, a framework that requires no training and utilizes large language models for deep query understanding and precise s…☆12Updated 2 months ago
- ☆62Updated 5 months ago
- Agentica: Effortlessly Build Intelligent, Reflective, and Collaborative Multimodal AI Agents! 轻松构建智能、具备反思能力、可协作的多模态AI Agent。☆132Updated 2 months ago
- Python3 package for Chinese/English OCR, with paddleocr-v4 onnx model(~14MB). 基于ppocr-v4-onnx模型推理,可实现 CPU 上毫秒级的 OCR 精准预测,通用场景中英文OCR达到开源SO…☆61Updated last month
- 基于Qwen2模型进行通用信息抽取【实体/关系/事件抽取】☆30Updated 7 months ago
- SearchGPT: Building a quick conversation-based search engine with LLMs.☆45Updated last month
- Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory☆28Updated 9 months ago
- dify's rag patch module☆153Updated last month
- ☆105Updated last year
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated 8 months ago
- Analysis of Chinese and English layouts 中英文版面分析☆173Updated last week
- gpt_server是一个用于生产级部署LLMs或Embedding的开源框架。☆155Updated this week
- 利用LLM+敏感词库,来自动判别是否涉及敏感词。☆112Updated last year
- 支持中文🇨🇳🇨🇳🇨🇳 的 microsoft/graphrag☆36Updated last month
- 部署你自己的OpenAI api🤩, 基于flask, transformers (使用 Baichuan2-13B-Chat-4bits 模型, 可以运行在单张Tesla T4显卡) ,实现了OpenAI中Chat, Models和Completions接口,包含流式响…☆89Updated last year
- SMP 2023 ChatGLM金融大模型挑战赛 60 分baseline思路介绍☆184Updated last year
- bisheng-unstructured library☆42Updated 3 months ago
- 百度QA100万数据集☆47Updated last year
- Based on RapidOCR, extract the PDF content.☆146Updated 6 months ago
- Here is a demo for PDF parser (Including OCR, object detection tools)☆33Updated 4 months ago
- 基于大语言模型的检索增强生成RAG示例☆126Updated 2 months ago
- TianMu: A modern AI tool with multi-platform support, markdown support, multimodal, continuous conversation, and customizable commands. 一…☆83Updated last year
- ☆107Updated 6 months ago
- 探索 LLM 在法律行业的应用潜力☆81Updated 2 months ago