代码大模型 预训练&微调&DPO 数据处理 业界处理pipeline sota
☆51Jul 25, 2024Updated last year
Alternatives and similar repositories for codellm-data-preprocess-pipeline
Users that are interested in codellm-data-preprocess-pipeline are comparing it to the libraries listed below
Sorting:
- 介绍docker、docker compose的使用。☆21Sep 4, 2024Updated last year
- Dify Streamlit Chat App☆14Aug 31, 2024Updated last year
- 天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案☆34Jul 7, 2024Updated last year
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32May 29, 2024Updated last year
- ☆11Updated this week
- A minimal LLM sales agent framework for sales agent fast deployment and benchmark. Support OpenAI models, Claude, HuggingFace models, Gem…☆19Sep 6, 2024Updated last year
- 用大模型批量处理数据,现支持各种大模型做OCR,支持通义千问, 月之暗面, 百度飞桨OCR, OpenAI 和LLAVA。Use LLM to generate or clean data for academic use. Support OCR with qwen, m…☆16Sep 15, 2024Updated last year
- 基于FunASR实现语音识别,包含常规版和ONNX版(推荐)。☆48Oct 12, 2024Updated last year
- Qwen-WisdomVast is a large model trained on 1 million high-quality Chinese multi-turn SFT data, 200,000 English multi-turn SFT data, and …☆18Apr 12, 2024Updated last year
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆26Feb 11, 2025Updated last year
- 平均感知机做词性标注☆22Oct 10, 2018Updated 7 years ago
- 通用简单工具项目☆22Oct 6, 2024Updated last year
- 实现使用开源的LangFlow框架,零代码实现大模型相关应用如流量包推荐智能客服、RAG应用等,并使用两种方式将创建的工作流集成到自己的项目中☆31Sep 9, 2024Updated last year
- ☆31Oct 2, 2024Updated last year
- ☆27Jul 25, 2023Updated 2 years ago
- LLM-based Multi-Agent 系统架构设计与项目代码实践☆35Nov 30, 2024Updated last year
- Qwen1.5-SFT(阿里, Ali), Qwen_Qwen1.5-2B-Chat/Qwen_Qwen1.5-7B-Chat微调(transformers)/LORA(peft)/推理☆74May 17, 2024Updated last year
- A simple WeChat Official Account layout tool based on Dify☆17Jun 27, 2025Updated 8 months ago
- Difyで作る生成AIアプリ完全入門☆17May 25, 2025Updated 9 months ago
- ☆26Feb 28, 2026Updated last week
- ☆42Mar 6, 2025Updated last year
- ☆29Aug 30, 2024Updated last year
- Chinese-Mistral: An Efficient and Effective Chinese Large Language Model☆32Jun 22, 2025Updated 8 months ago
- ☆22Feb 11, 2026Updated 3 weeks ago
- 北语 246 实验室新生简明指南☆10May 30, 2022Updated 3 years ago
- Workflow automation, but you just describe what you want and it happens.☆27Nov 22, 2025Updated 3 months ago
- Write the database metadata into the dify knowledge☆12Dec 30, 2025Updated 2 months ago
- ☆11Aug 29, 2025Updated 6 months ago
- ☆36Mar 18, 2025Updated 11 months ago
- ☆28Dec 4, 2025Updated 3 months ago
- A full-stack AI-powered business intelligence tool for non-experts, featuring serverless backend processing and a secure Streamlit fronte…☆28Feb 13, 2026Updated 3 weeks ago
- mcp的webui界面,支持客户端连接多个sse服务端,支持 openai、deepseek、qwen等大模型,另外附上构建的 agent的 stdio和sse的简单 天气查询的完整示例☆41May 23, 2025Updated 9 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆39Sep 12, 2024Updated last year
- ☆89Jan 27, 2026Updated last month
- A Multi-Format Transfer Learning Model for Event Argument Extraction via Variational Information Bottleneck☆10Sep 9, 2022Updated 3 years ago
- Java implementation for the Agent2Agent Protocol (A2A - https://github.com/google/A2A), enabling interaction between AI agents through a …☆11Apr 21, 2025Updated 10 months ago
- ☆28Jun 27, 2025Updated 8 months ago
- 知予人工智能:从学习者到研究者☆13Jan 20, 2025Updated last year
- A small framework to benchmark forecasting models via backtesting☆13Nov 25, 2023Updated 2 years ago