jackfsuia / LLM-Data-Cleaner
用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.
☆9Updated last month
Related projects ⓘ
Alternatives and complementary repositories for LLM-Data-Cleaner
- 探索 LLM 在法律行业的应用潜力☆24Updated this week
- ☆13Updated 4 months ago
- ☆67Updated 6 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆38Updated 2 months ago
- 中文海事大模型郑和(Zh-LLM)☆13Updated 10 months ago
- 使用langchain实现 故事情景生成,情感情景引导,剧情总结,性格分析☆14Updated 5 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated last week
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆68Updated 2 months ago
- ☆77Updated 6 months ago
- Search, organize, discover anything!☆47Updated 6 months ago
- A minimalist benchmarking tool designed to test the routine-generation capabilities of LLMs.☆15Updated last week
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆127Updated 5 months ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆25Updated 4 months ago
- 🔥🔥First-ever hour scale video understanding models☆156Updated 2 weeks ago
- 通义千问的DPO训练☆27Updated last month
- 我们是第一个完全可商用的角色大模型。☆35Updated 3 months ago
- ☆21Updated 3 weeks ago
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆57Updated 4 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆109Updated last week
- ☆30Updated 5 months ago
- 大模型检索增强生成技术最佳实践。☆42Updated 2 months ago
- The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆65Updated last month
- Repo for for paper "AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction".☆50Updated 3 months ago
- A Training-free Iterative Framework for Long Story Visualization☆59Updated last month
- NLP 项目记录档案☆42Updated 2 weeks ago
- Here is a demo for PDF parser (Including OCR, object detection tools)☆31Updated 3 weeks ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- Visual Instruction Tuning for Qwen2 Base Model☆19Updated 4 months ago
- ☆72Updated 10 months ago