isLinXu / regex-tokenizerLinks

Converted the Jina Tokenizer regex pattern to python.

☆26

Alternatives and similar repositories for regex-tokenizer

Users that are interested in regex-tokenizer are comparing it to the libraries listed below

Sorting:

riddle911 / SuperInsights
☆66Updated 9 months ago
wangyuxinwhy / generate
A Python Package to Access World-Class Generative Models
☆127Updated last year
liguodongiot / unify-easy-llm
unify-easy-llm（ULM）旨在打造一个简易的一键式大模型训练工具，支持Nvidia GPU、Ascend NPU等不同硬件以及常用的大模型。
☆55Updated 11 months ago
CLUEbenchmark / SuperCLUE-RAG
中文原生检索增强生成测评基准
☆118Updated last year
SmartFlowAI / Hand-on-RAG
顾名思义：手搓的RAG
☆124Updated last year
yangjianxin1 / unsloth
Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory
☆27Updated last year
li-xiu-qi / x-pdf2md
本项目借助飞桨平台，构建起一套创新的多模型协同系统，实现 PDF 文件到 Markdown 文件的高效、精准转换。
☆16Updated 3 months ago
infinigence / InfiniWebSearch
A demo built on Megrez-3B-Instruct, integrating a web search tool to enhance the model's question-and-answer capabilities.
☆38Updated 6 months ago
zjrwtx / SFT-data-builder
利用免费的大模型api来结合你的私域数据来生成sft训练数据（妥妥白嫖）支持llamafactory等工具的训练数据格式synthetic data
☆167Updated 7 months ago
gameofdimension / vllm-cn
演示 vllm 对中文大语言模型的神奇效果
☆31Updated last year
shibing624 / SearchGPT
SearchGPT: Building a quick conversation-based search engine with LLMs.
☆46Updated 5 months ago
OKC13 / General-Documents-Layout-parser
通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser
☆46Updated last year
intsig-textin / markdown_tester
如需体验textin文档解析，请点击https://cc.co/16YSIy
☆101Updated 7 months ago
MetaGLM / LawGLM
探索 LLM 在法律行业的应用潜力
☆90Updated 6 months ago
kv1830 / fast_pdf_trans
Fast pdf translate是一款pdf翻译软件，基于MinerU实现pdf转markdown的功能，接着对markdown进行分割，送给大模型翻译，最后组装翻译结果并由pypandoc生成结果pdf。
☆23Updated 3 months ago
billvsme / my_openai_api
部署你自己的OpenAI api🤩, 基于flask, transformers (使用 Baichuan2-13B-Chat-4bits 模型, 可以运行在单张Tesla T4显卡) ，实现了OpenAI中Chat, Models和Completions接口，包含流式响…
☆93Updated last year
dataelement / bisheng-unstructured
bisheng-unstructured library
☆51Updated last month
shibing624 / agentica
Agentica: Effortlessly Build Intelligent, Reflective, and Collaborative Multimodal AI Agents! 构建智能的多模态AI Agent。
☆175Updated this week
RapidAI / RapidLayout
Analysis of Chinese and English layouts 中英文版面分析
☆218Updated last week
taishan1994 / Qwen2-UIE
基于Qwen2模型进行通用信息抽取【实体/关系/事件抽取】
☆31Updated 11 months ago
hyperai / vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
☆80Updated last month
MetaGLM / OpenLM
本项目致力于为大模型领域的初学者提供全面的知识体系，包括基础和高阶内容，以便开发者能迅速掌握大模型技术栈并全面了解相关知识。
☆61Updated 5 months ago
t6am3 / law_glm_baseline
☆15Updated last year
hzauzxb / guidance-ocr
视觉信息抽取任务中，使用OCR识别结果规范多模态大模型的回答
☆35Updated 5 months ago
jasonkylelol / graphrag-chinese
支持中文🇨🇳🇨🇳🇨🇳 的 microsoft/graphrag
☆47Updated 2 months ago
shell-nlp / gpt_server
gpt_server是一个用于生产级部署LLMs、Embedding、Reranker、ASR和TTS的开源框架。
☆194Updated this week
zzlgreat / smart_agent
☆105Updated last year
lutongyv / Textin_Tester
如需体验textin文档解析，请点击https://cc.co/16YSIy
☆22Updated 11 months ago
shuyhere / all-about-llm
大语言模型训练和服务调研
☆37Updated last year
jiangnanboy / llm_corpus_quality
大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
☆65Updated 11 months ago