ccprocessor / llm-webkit-mirror
☆18Updated this week
Alternatives and similar repositories for llm-webkit-mirror
Users that are interested in llm-webkit-mirror are comparing it to the libraries listed below
Sorting:
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆22Updated 5 months ago
- SDK of OpenDataLab - https://opendatalab.org.cn☆57Updated last year
- Dataset and Code for our ACL 2024 paper: "Multimodal Table Understanding". We propose the first large-scale Multimodal IFT and Pre-Train …☆202Updated 2 weeks ago
- [ACM'MM 2024 Oral] Official code for "OneChart: Purify the Chart Structural Extraction via One Auxiliary Token"☆224Updated last month
- GOT的vLLM加速实现 并结合 MinerU 实现RAG中的pdf 解析☆56Updated 6 months ago
- AAAI 2024: Visual Instruction Generation and Correction☆93Updated last year
- 万卷1.0多模态语料☆560Updated last year
- Data Set Description Language Specification (新一代人工智能数据集描述语言DSDL)☆47Updated 11 months ago
- ☆168Updated last year
- [ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models☆347Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆238Updated 5 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆281Updated 8 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆424Updated last month
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆245Updated 3 weeks ago
- datasets resource☆113Updated 3 weeks ago
- Document Artifical Intelligence☆164Updated 3 weeks ago
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆146Updated 11 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆268Updated 3 months ago
- ☆134Updated last year
- Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)☆82Updated 7 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆326Updated last month
- ☆280Updated 9 months ago
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆270Updated last year
- A simple tool to batch process messages using OpenAI's GPT models. `GPTBatcher` allows for efficient handling of multiple requests simult…☆40Updated last week
- ☆126Updated last week
- The WordScape repository contains code for the WordScape pipeline to create datasets to train document understanding models.☆35Updated last year
- MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval☆167Updated this week
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆151Updated 8 months ago
- ☆226Updated last year
- [EMNLP 2023 Demo] "CLEVA: Chinese Language Models EVAluation Platform" [ACL 2025 Findings] "CLEVA: Toward Comprehensive and Contamination…☆62Updated this week