magicpdf / Magic-DocLinks
conversion doc(pdf/html/doc/docx/ppt/pptx)to markdown
☆44Updated 11 months ago
Alternatives and similar repositories for Magic-Doc
Users that are interested in Magic-Doc are comparing it to the libraries listed below
Sorting:
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆101Updated 7 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆290Updated 9 months ago
- TianGong-AI-Unstructure☆67Updated last week
- ☆27Updated 8 months ago
- A demo built on Megrez-3B-Instruct, integrating a web search tool to enhance the model's question-and-answer capabilities.☆38Updated 6 months ago
- 通用版面分析 | 中文文档解析 |Document Layout Analysis | layout paser☆46Updated last year
- Analysis of Chinese and English layouts 中英文版面分析☆218Updated this week
- 中文原生检索增强生成测评基准☆118Updated last year
- ☆61Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆135Updated 6 months ago
- ☆66Updated 9 months ago
- Repo for "MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability"☆122Updated 3 weeks ago
- Imitate OpenAI with Local Models☆87Updated 9 months ago
- MPB (Miner-PDF-Benchmark) is an end-to-end PDF document comprehension evaluation suite designed for large-scale model data scenarios.☆23Updated 6 months ago
- Python implementation of AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, w…☆45Updated 3 months ago
- 该项目是为了使用layoutlmv3针对中文图片训练和推理。 其中主要解决三个问题: 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作☆48Updated 9 months ago
- Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception☆195Updated 2 weeks ago
- code for piccolo embedding model from SenseTime☆129Updated last year
- 阅读顺序、Layoutreader☆15Updated last month
- Agentica: Effortlessly Build Intelligent, Reflective, and Collaborative Multimodal AI Agents! 构建智能的多模态AI Agent。☆175Updated this week
- 文档方向分类☆219Updated 7 months ago
- 基于序列表格识别算法推理库,集成PP-Structure和modelscope等表格识别算法。☆314Updated this week
- GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation☆205Updated last week
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆67Updated 11 months ago
- 大语言模型训练和服务调研☆37Updated last year
- 使用langchain进行任务规划,构建子任务的会话场景资源,通过MCTS任务执行器,来让每个子任务通过在上下文中资源,通过自身反思探索来获取自身对问题的最优答案;这种方式依赖模型的对齐偏好,我们在每种偏好上设计了一个工程框架,来完成自我对不同答案的奖励进行采样策略☆29Updated last month
- bge推理优化相关脚本☆28Updated last year
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆151Updated last year
- This repository provides an implementation of the paper "A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Co…☆70Updated 3 months ago
- [NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence☆144Updated 9 months ago