jackfsuia / LLM-Data-Cleaner
用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.
☆9Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLM-Data-Cleaner
- 探索 LLM 在法律行业的应用潜力☆27Updated this week
- ☆13Updated 5 months ago
- 大模型检索增强生成技术最佳实践。☆46Updated 2 months ago
- 中文海事大模型郑和(Zh-LLM)☆13Updated 11 months ago
- 通义千问的DPO训练☆27Updated 2 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆38Updated 2 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆87Updated 2 weeks ago
- Here is a demo for PDF parser (Including OCR, object detection tools)☆30Updated last month
- A simple way to synthesize LLM training data. (under construction⚠)☆10Updated this week
- ☆22Updated last month
- ☆51Updated 8 months ago
- ☆13Updated this week
- 使用langchain实现 故事情景生成,情感情景引导,剧情总结,性格分析☆14Updated 5 months ago
- A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents☆31Updated this week
- official code for "Fox: Focus Anywhere for Fine-grained Multi-page Document Understanding"☆128Updated 5 months ago
- Just for debug☆56Updated 9 months ago
- ☆15Updated 2 months ago
- ☆120Updated 2 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 7 months ago
- ☆15Updated 4 months ago
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 7 months ago
- TianGong-AI-Unstructure☆51Updated this week
- ☆26Updated 3 weeks ago
- 个人项目地址,一些大语言模型和多模态模型的应用☆123Updated 2 weeks ago
- SUS-Chat: Instruction tuning done right☆47Updated 10 months ago
- Imitate OpenAI with Local Models☆85Updated 2 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆51Updated 3 weeks ago
- Search, organize, discover anything!☆47Updated 7 months ago
- MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)☆127Updated 5 months ago
- ☆74Updated 11 months ago