jiangnanboy / llm_corpus_qualityView external linksLinks
大模型预训练中文语料清洗及质量评估 Large model pre-training corpus cleaning
☆75Jul 25, 2024Updated last year
Alternatives and similar repositories for llm_corpus_quality
Users that are interested in llm_corpus_quality are comparing it to the libraries listed below
Sorting:
- 这里将paddle中的ocr等模型转为onnx格式,并利用java版深度框架djl加载这些onnx模型进行推理预测尝试。☆13Nov 15, 2022Updated 3 years ago
- text security audit 安全审核-语义模型过滤 敏感内容检测系统☆37Feb 14, 2025Updated last year
- 大模型API企业网关,公司内部API管理,分发聚和系统,支持将多种大模型转换成统一的OpenAI兼容接口,尤其对国内开源模型deepseek,qwen,kimi,glm提供特别支持 可供个人或者企业内部大模型API统一管理和渠道分发使用(key管理与二次分发),长期更新,支…☆36Sep 12, 2025Updated 5 months ago
- 基于simhash的文本去重算法☆20Jun 18, 2021Updated 4 years ago
- onnx-java,这里利用java加载onnx模型,并进行推理。☆22May 19, 2022Updated 3 years ago
- Spring Deep Java Library 通过利用DJL框架与其他Spring框架进行整合,进行深度学习模型训练和推导。☆24Mar 16, 2022Updated 3 years ago
- 采用LangGraph实现的结合deepagents架构的旅行规划助手,集成Qwen3、DeepSeek、GLM4.5等多款顶尖大模型☆52Oct 13, 2025Updated 4 months ago
- 智能文本自动处理工具(Intelligent text automatic processing tool)。AutoText的功能主要有文本纠错,图片ocr、版面检测以及表格结构识别等。The main functions of this project include …☆27May 17, 2023Updated 2 years ago
- ☆363Jun 13, 2024Updated last year
- 本项目旨在对大量文本文件进行快速编码检测和转换以辅助mnbvc语料集项目的数据清洗工作☆70Oct 17, 2025Updated 3 months ago
- 对llava官方代码的一些学习笔记☆29Oct 11, 2024Updated last year
- Y-Agent Studio 是一个面向 企业级应用 的Agent开发套,Y-Agent是其中的核心模块。 包含了:支持智能体编排、RAG、流程日志、单元测试、流程测试、语料生产等垂直领域非常需要的功能。 智能体编排可以在同一个流程中,同时支持多智能体协作和流程混合编排…☆25Oct 4, 2025Updated 4 months ago
- 本项目利用JNI加载paddle-ocr的C++编译的dll库,并利用springboot进行web部署访问。This project uses JNI to load the C++ compiled dll libraries of paddle-ocr, and us…☆37Dec 30, 2024Updated last year
- 中文世界的NLP自动标注开源工具,简单样本,交给LabelFast。☆85Dec 7, 2025Updated 2 months ago
- The code for our ACL2022 findings paper: CRACSpell: A Contextual Typo Robust Approach with Copy Mechanism to Improve Chinese Spelling Cor…☆77May 16, 2022Updated 3 years ago
- [COLING 2024] CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News☆44Jan 26, 2026Updated 2 weeks ago
- 一个基于FastAPI和React的智能体系统,支持多智能体管理、mcp管理、知识库、聊天对话等功能。An intelligent agent system based on FastAPI and React, supporting multi-agent managem…☆21Jan 25, 2026Updated 3 weeks ago
- 团队更名为 QAX A-TEAM☆10Apr 28, 2019Updated 6 years ago
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- 基于modelscope(魔搭社区)阿里大模型的语音转文本工具☆10Feb 2, 2024Updated 2 years ago
- TABLE DETECTION IN IMAGES AND OCR TO CSV WITH JAVA☆10Jul 18, 2023Updated 2 years ago
- a tiny net framework☆10Dec 8, 2023Updated 2 years ago
- 基于检索增强生成(RAG)技术的ICD-10医疗诊断内容标准化工具,支持中文医学术语的智能匹配和标准化。☆17Aug 12, 2025Updated 6 months ago
- ☆13Aug 28, 2024Updated last year
- OpenHIS医院系统(信创版)集十大核心模块于一体,涵盖目录管理、基础数据配置、个性化设置、门诊/住院全流程管理、药房药库智能管控、精细化耗材管理、财务核算体系、医保合规对接及多维报表分析等功能模块,共计372项标准化功能。☆13Feb 5, 2026Updated last week
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- Code & Data for our Paper "NaSGEC: Multi-Domain Chinese Grammatical Error Correction for Native Speaker Texts" (ACL 2023 Findings)☆96Feb 18, 2025Updated 11 months ago
- This is some summary code and model☆40Dec 14, 2021Updated 4 years ago
- ☆49Mar 21, 2022Updated 3 years ago
- Rust tool to get info from your lycamobile.es account☆10Apr 29, 2021Updated 4 years ago
- ☆14Sep 17, 2024Updated last year
- Course asset for the VR Developer Nanodegree > VR Scenes & Objects > Game Objects lesson☆11Jun 28, 2022Updated 3 years ago
- 小星星点起,谢谢哈~html5的多个video标签:截取视频源的封面图poster;增加监听视频播放状态的功能;☆10Feb 23, 2021Updated 4 years ago
- 一个强大的、由 AI 驱动的演示文稿(PPt)自动化生成工具,真正生产化的工具,全流程可控,帮助用户快速制作出符合需求的 PPt。☆26Sep 23, 2025Updated 4 months ago
- char <-> Unicode character name (maintained fork of huonw/unicode_names)☆12Sep 7, 2025Updated 5 months ago
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 5 months ago
- GTS Engine: A powerful NLU Training System。GTS引擎(GTS-Engine)是一款开箱即用且性能强大的自然语言理解引擎,聚焦于小样本任务,能够仅用小样本就能自动化生产NLP模型。☆93Feb 28, 2023Updated 2 years ago
- 🕵️♂️🔊 Automatically update Audio Deepfake Detection (ADD) papers daily using GitHub Actions (updates every 12 hours)☆17Updated this week
- AI 应用服务平台☆28Nov 12, 2025Updated 3 months ago