jackfsuia / LLM-Data-Cleaner
用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.
☆11Updated 4 months ago
Alternatives and similar repositories for LLM-Data-Cleaner:
Users that are interested in LLM-Data-Cleaner are comparing it to the libraries listed below
- ☆20Updated 3 months ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆26Updated 6 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆44Updated 4 months ago
- ☆78Updated 8 months ago
- ☆36Updated 3 months ago
- ☆14Updated 6 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆56Updated 2 months ago
- Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024☆52Updated 2 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆29Updated 7 months ago
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Updated 9 months ago
- GLM Series Edge Models☆124Updated 2 weeks ago
- 中文领域心理健康对话大模型simpsybot☆26Updated last month
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆59Updated 6 months ago
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆75Updated last week
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆107Updated 2 months ago
- 通义千问的DPO训练☆30Updated 3 months ago
- Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.☆181Updated this week
- 我们是第一个完全可商用的角色大模型。☆38Updated 5 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆109Updated 2 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆28Updated 3 months ago
- SUS-Chat: Instruction tuning done right☆48Updated last year
- the newest version of llama3,source code explained line by line using Chinese☆22Updated 9 months ago
- Just for debug☆56Updated 11 months ago
- TianGong-AI-Unstructure☆56Updated 2 weeks ago
- 大模型检索增强生成技术最佳实践。☆54Updated 4 months ago
- ☆44Updated 7 months ago
- ☆10Updated 4 months ago
- SearchGPT: Building a quick conversation-based search engine with LLMs.☆43Updated 2 weeks ago
- ☆11Updated 11 months ago
- ☆162Updated last month