用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, moonshot, PaddleOCR, OpenAI, Llava.
☆16Sep 15, 2024Updated last year
Alternatives and similar repositories for LLM-Data-Cleaner
Users that are interested in LLM-Data-Cleaner are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Apr 30, 2025Updated 10 months ago
- A minimal LLM sales agent framework for sales agent fast deployment and benchmark. Support OpenAI models, Claude, HuggingFace models, Gem…☆19Sep 6, 2024Updated last year
- MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]☆23Dec 10, 2025Updated 3 months ago
- Learn how to create impactful AI Agents using Agno AI Python Package☆13Jul 31, 2025Updated 7 months ago
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆21May 9, 2025Updated 10 months ago
- Finetune and Inference Qwen3-0.6B.☆28May 5, 2025Updated 10 months ago
- ☆10Dec 8, 2022Updated 3 years ago
- Web app to remove images background☆13Aug 6, 2024Updated last year
- Accelerating GOT-OCRv2 with VLLM☆11Nov 15, 2024Updated last year
- Yet Another Papers With Code☆37Sep 7, 2025Updated 6 months ago
- CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter☆23May 28, 2025Updated 9 months ago
- vllm混合推理扩展插件,支持多NUMA混合推理,单卡推理Qwen3-Next模型可达1000+ prefill☆30Nov 7, 2025Updated 4 months ago
- ☆17Dec 29, 2023Updated 2 years ago
- 条件随机场(CRF)的pytorch实现☆10Mar 7, 2021Updated 5 years ago
- XGEN-MM(BLIP3) Autocaptioning Tools☆17Jun 20, 2024Updated last year
- 针对建筑规范文本数据的知识图谱实体关系提取,知识图谱构建,检索增强生成DEMO☆37Aug 7, 2024Updated last year
- 山东省第二届数据应用创新创业大赛-主赛场-检验报告单识别-Baseline☆13Jan 15, 2021Updated 5 years ago
- ✨ 大语言模型 (LLM) 的自然语言数据库查询系统 (RAG) Natural Language Database Query System (RAG) based on LLM✨ (with README in English) 🚩 通过自然语言提问,使用大语言模型智…☆64May 27, 2025Updated 9 months ago
- Codebase for Instruction Following without Instruction Tuning☆36Sep 24, 2024Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- An implementation of MSSRM method☆11Mar 23, 2023Updated 3 years ago
- 代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota☆52Jul 25, 2024Updated last year
- Pytorch implementation of MoLA☆21Jun 9, 2025Updated 9 months ago
- 基于deepseek、qwen3大模型,lora sft 医疗行业数据☆15Dec 2, 2025Updated 3 months ago
- ☆28Oct 14, 2024Updated last year
- Spark in Action, 2nd edition - chapter 4☆18Apr 21, 2023Updated 2 years ago
- DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。☆10Jan 9, 2024Updated 2 years ago
- LLM 推理服务性能测试☆44Dec 17, 2023Updated 2 years ago
- ☆26May 11, 2025Updated 10 months ago
- ragflow中的ocr部分,非官方项目☆54Aug 26, 2024Updated last year
- ☆25Sep 3, 2025Updated 6 months ago
- Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization☆13Jan 12, 2026Updated 2 months ago
- 「城语」APP基于A级景区、历史古迹、文物保护单位等基础数据,利用先进的大模型能力实现智能化的Citywalk 路线规划,包括设计一条路线、生成路线攻略、生成景点的推荐理由等三大核心功能;利用大模型减少了人工编辑和推荐的工作量,并可以根据游客的需求进行个性化定制,提升了游客…☆19Feb 20, 2024Updated 2 years ago
- ☆20Mar 12, 2025Updated last year
- ASR on WS, POST/GET FAST_API Can use many RU asr models.☆19Jan 27, 2026Updated last month
- ☆13Jan 22, 2025Updated last year
- 基于Qwen2+SFT+DPO的医疗问答系统,项目中使用了自定义的 SFTTrainer/DPOTrainer/TRPOTrainer用于训练,其次,项目还调用各种知识库工具(neo4j, milvus, LDA, 等)进行自动化训练数据生成。另外,使用 vllm 用于推理…☆68Jan 4, 2026Updated 2 months ago
- ☆16Jan 16, 2025Updated last year