Awesome Chinese LLM: A curated list of Chinese Large Language Model 中文大语言模型数据集和模型资料汇总
☆165Jun 10, 2024Updated last year
Alternatives and similar repositories for awesome-chinese-llm
Users that are interested in awesome-chinese-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。☆22,469May 19, 2025Updated 10 months ago
- Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合☆5,543Mar 22, 2026Updated last week
- Awesome Chinese Corpus Datasets and Models.☆18Oct 28, 2019Updated 6 years ago
- MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各 个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志…☆4,151Mar 22, 2026Updated last week
- macrogpt大模型全量预训练(1b3,32层), 多卡deepspeed/单卡adafactor☆15Nov 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for the KDD 2022 paper "Interpreting Trajectories from Multiple Views: A Hierarchical Self-Attention Network for Estimating the Time…☆18May 29, 2022Updated 3 years ago
- This is the implementation code for the paper "Trainable Undersampling for Class-Imbalance Learning" published in AAAI2019☆15Mar 17, 2019Updated 7 years ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 2 months ago
- Berkeley Function Calling Leaderboard (BFCL) with Chinese-Language Evaluation☆23Apr 6, 2025Updated 11 months ago
- This is the repository to reproduce the experiments of the IJCAI 2020 paper "Metric Learning in Optimal Transport for Domain Adaptation"☆23Jun 9, 2020Updated 5 years ago
- Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料☆1,002Feb 6, 2026Updated last month
- 😄😐😠 情感分析(使用 emoji 可视化)☆10Sep 5, 2021Updated 4 years ago
- This is the code repo for the paper <UTC-IE: A Unified Token-pair Classification Architecture for Information Extraction>☆15Aug 10, 2023Updated 2 years ago
- ☆18Apr 28, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- This is the official implementation of RL-Chord (TNNLS).☆13Jan 2, 2024Updated 2 years ago
- BachDuet enables a human performer to improvise a duet counterpoint with a computer agent in real time.☆14Aug 8, 2022Updated 3 years ago
- PrefixKV: Adaptive Prefix KV Cache is What Vision Instruction-Following Models Need for Efficient Generation [NeurIPS 2025]☆18Oct 11, 2025Updated 5 months ago
- AGI调研资料汇总☆24Sep 2, 2025Updated 6 months ago
- Code repo for MathAgent☆20Dec 15, 2023Updated 2 years ago
- 公安网备 敏感词过滤词☆14Oct 7, 2018Updated 7 years ago
- MATLAB code for 「Missing traffic data imputation and pattern discovery with a Bayesian augmented tensor factorization model」.☆14Nov 23, 2020Updated 5 years ago
- Anomaly Detection for time-series using Multilevel Wavelet Decomposition Networks.☆10Dec 11, 2019Updated 6 years ago
- 中文 Instruction tuning datasets☆143Apr 10, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Tokyo Metropolitan University Paraphrase Corpus (TMUP)☆11Jun 12, 2017Updated 8 years ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,287Oct 16, 2024Updated last year
- 🎹 A sheet music PDF to MIDI conversion tool☆10Dec 29, 2016Updated 9 years ago
- 基于百度uie的关系抽取☆20Sep 26, 2022Updated 3 years ago
- Firefly: 大模型训练工具,支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…☆6,652Oct 24, 2024Updated last year
- BERT NER of pytorch editon, including ERNIE implementation.☆11Aug 28, 2019Updated 6 years ago
- solve text generation tasks by the language model GPT2, including papers, code, demo demos, and hands-on tutorials. 使用语言模型GPT2来解决文本生成任务的…☆26Aug 27, 2019Updated 6 years ago
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- 该仓库主要记录 NLP 算法工程师相关的 搜索引擎 学习笔记☆13Apr 9, 2022Updated 3 years ago
- 中文通用大模型开放域多轮测评基准 | An Open Domain Benchmark for Foundation Models in Chinese☆80Aug 25, 2023Updated 2 years ago
- 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP☆9,877Feb 6, 2026Updated last month
- 中英文信息抽取数据集整理☆20May 15, 2022Updated 3 years ago
- 本项目致力于为大模型领域的初学者提供全面的知识体系,包括基础和高阶内容,以便开发者能迅速掌握大模型技术栈并全面了解相关知识。☆65Jan 6, 2025Updated last year
- A quick guide (especially) for trending instruction finetuning datasets☆3,377Nov 28, 2023Updated 2 years ago
- This is the code repo for our paper "Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts".☆43Sep 27, 2025Updated 6 months ago