jackfsuia / LLM-Data-CleanerLinks

用大模型批量处理数据，现支持各种大模型做OCR，支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.

☆14

Alternatives and similar repositories for LLM-Data-Cleaner

Users that are interested in LLM-Data-Cleaner are comparing it to the libraries listed below

Sorting:

AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆61Updated 9 months ago
wux-labs / OpenXLab-IntelligentSalesAssistant
☆19Updated last year
360AILABNLP / 360LayoutAnalysis
☆27Updated 8 months ago
yujunhuics / Reyes
从零到一实现了一个多模态大模型，并命名为Reyes（睿视），R：睿，eyes：眼。Reyes的参数量为8B，视觉编码器使用的是InternViT-300M-448px-V2_5,语言模型侧使用的是Qwen2.5-7B-Instruct，Reyes也通过一个两层MLP投影层连…
☆14Updated 4 months ago
LLM-Red-Team / emo-visual-data
😜 表情包视觉数据集，使用glm-4v、step-1v的图像解析能力标注。
☆122Updated last year
hzauzxb / guidance-ocr
视觉信息抽取任务中，使用OCR识别结果规范多模态大模型的回答
☆35Updated 5 months ago
owenliang / qwen-dpo
通义千问的DPO训练
☆49Updated 9 months ago
WalkerMitty / PDFparser
Here is a demo for PDF parser (Including OCR, object detection tools)
☆35Updated 8 months ago
ITRECLab / Zh-MT-LLM
中文海事大模型郑和（Zh-LLM）
☆17Updated last year
linancn / TianGong-AI-Unstructure
TianGong-AI-Unstructure
☆68Updated 2 weeks ago
seanzhang-zhichen / Qwen-WisdomVast
Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …
☆18Updated last year
VovyH / MultiAgent-Search
[2025-上海人工智能实验室书生实训营十佳、优秀项目]
☆30Updated last month
li-xiu-qi / SmartlmageFinder
一个基于多模态向量模型及视觉多模态模型构建的图片搜索引擎&管理系统，实现精准的以文搜文，文搜图、以图搜图多种智能检索方式。An image search engine management system built upon multimodal vector models…
☆42Updated last week
shibing624 / SearchGPT
SearchGPT: Building a quick conversation-based search engine with LLMs.
☆46Updated 5 months ago
cwxndl / LLM
大语言模型应用：RAG、NL2SQL、聊天机器人、预训练、MOE混合专家模型、微调训练、强化学习、天池数据竞赛
☆62Updated 4 months ago
ClosedCharacter / Peach
我们是第一个完全可商用的角色大模型。
☆40Updated 10 months ago
zhaochenyang20 / Prompt2Model-Self-Guide
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper
☆32Updated last year
Liuziyu77 / Soda
Search, organize, discover anything!
☆49Updated last year
glide-the / InterpretationoDreams
使用langchain进行任务规划，构建子任务的会话场景资源，通过MCTS任务执行器，来让每个子任务通过在上下文中资源，通过自身反思探索来获取自身对问题的最优答案；这种方式依赖模型的对齐偏好，我们在每种偏好上设计了一个工程框架，来完成自我对不同答案的奖励进行采样策略
☆29Updated last month
yangjianxin1 / LongQLoRA
LongQLoRA: Extent Context Length of LLMs Efficiently
☆166Updated last year
MetaGLM / LawGLM
探索 LLM 在法律行业的应用潜力
☆90Updated 6 months ago
thunlp / Delta-CoMe
Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024
☆57Updated 7 months ago
zhanghx0905 / qwen-tools-openai-server
An OpenAI API-compatible middleware for Qwen OpenAI API, implementing (stream) tool calling functionality
☆9Updated 10 months ago
Alannikos / FunGPT
In this fast-paced world, we all need a little something to spice up life. Whether you need a glass of sweet talk to lift your spirits or…
☆58Updated 3 weeks ago
aliyun / qwen-dianjin
Qwen DianJin: LLMs for the Financial Industry by Alibaba Cloud
☆113Updated last month
percent4 / llm_math_solver
本项目用于大模型数学解题能力方面的数据集合成，模型训练及评测，相关文章记录。
☆91Updated 9 months ago
BryanMurkyChan / Project_Miao
一起来养一只拥有专属记忆的AI猫猫吧！
☆10Updated 8 months ago
AI-Study-Han / Mini-Llama2-Chinese
想要从零开始训练一个中文的mini大语言模型，可以进行基本的对话，模型大小根据手头的机器决定
☆60Updated 10 months ago
reilxlx / llava-Qwen2-7B-Instruct-Chinese-CLIP
模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力，接近gpt4o、claude-3.5-sonnet的识别水平！
☆23Updated 11 months ago
Alibaba-NLP / CoFE-RAG
☆36Updated 2 months ago