Converted the Jina Tokenizer regex pattern to python.
☆26Aug 26, 2024Updated last year
Alternatives and similar repositories for regex-tokenizer
Users that are interested in regex-tokenizer are comparing it to the libraries listed below
Sorting:
- 🔨🔨🔨Tool for making model training data set☆20Nov 1, 2024Updated last year
- paper-read-notes☆13Sep 26, 2024Updated last year
- ☆28May 19, 2024Updated last year
- something for paper agent☆11Dec 18, 2024Updated last year
- Azure Machine Learning - MLOps Python SDKv2☆10Jul 24, 2023Updated 2 years ago
- 斗破苍穹小说的新词发现☆13May 12, 2022Updated 3 years ago
- Modern normalizing flows in Python. Simple to use and easily extensible.☆12Feb 11, 2026Updated 2 weeks ago
- Twinkle✨: Training workbench to make your model glow.☆45Updated this week
- OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models☆29Feb 4, 2026Updated 3 weeks ago
- 🔨🔨🔨(mmplot)used to draw graphs of multiple index parameters such as algorithm accuracy and speed of multiple deep learning models.☆86Aug 22, 2024Updated last year
- 🚀 Simple and efficient use for Ultralytics yolov5🚀☆32Jan 17, 2023Updated 3 years ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- Color detection, Contour mapping, Detecting holes, Motion detection☆10Mar 20, 2014Updated 11 years ago
- A scalable data preprocessing framework built on PySpark for LLM training☆22Dec 9, 2025Updated 2 months ago
- Image Tokenizer Needs Post-Training☆24Oct 4, 2025Updated 4 months ago
- 这里将paddle中的ocr等模型转为onnx格式,并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。☆13Nov 15, 2022Updated 3 years ago
- Creating Your Divine Agent 😇☆10Jan 26, 2026Updated last month
- Python solutions to coding questions in Leetcode☆13Sep 12, 2020Updated 5 years ago
- 中文事件抽取☆11Feb 26, 2021Updated 5 years ago
- A lightweight, production-ready C++ library for LLM tokenization, fully compatible with HuggingFace tokenizer.json.☆24Jan 4, 2026Updated last month
- AI 应用服务平台☆28Nov 12, 2025Updated 3 months ago
- Dataset of metadata on 3 million public Github repositories☆15Jan 16, 2025Updated last year
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 3 months ago
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆11Jun 16, 2025Updated 8 months ago
- OpenGraph is an open-source graph processing benchmarking suite written in pure C/OpenMP.☆12Apr 27, 2024Updated last year
- Paddle Automatically Diff Precision Toolkits.☆53Dec 5, 2025Updated 2 months ago
- This sample shows how to use the oneAPI Video Processing Library (oneVPL) to perform a single and multi-source video decode and preproces…☆15Jun 15, 2023Updated 2 years ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- A distributed in-memory store for temporal knowledge graphs☆10Mar 20, 2024Updated last year
- Bert TensorRT模型加速部署☆10Apr 1, 2022Updated 3 years ago
- Transfer Learning on Dogs vs Cats dataset using PyTorch C+ API☆12Aug 16, 2019Updated 6 years ago
- ☆15Jun 18, 2021Updated 4 years ago
- The source code for BUTTERFLY COUNTING IN BIPARTITE NETWORKS☆12May 29, 2019Updated 6 years ago
- A full stack typescript SAAS boilerplate with Chat, Auth (Langgraph, supabase), Payments (stripe), and AI Credits☆17May 23, 2025Updated 9 months ago
- Just a template for quickly creating a python library.☆10Feb 8, 2026Updated 3 weeks ago
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆29Dec 24, 2025Updated 2 months ago
- Open Platform Robot☆14Jul 9, 2022Updated 3 years ago
- A mini soft renderer.☆13Dec 17, 2023Updated 2 years ago
- Quickly and easily deploy TF2 Image Object Detection models from TensorFlow Hub trained on COCO 2017 dataset.☆12Nov 11, 2020Updated 5 years ago