FudanNLPLAB / CBook-150KView external linksLinks
中文图书语料MD5链接
☆217Jan 31, 2024Updated 2 years ago
Alternatives and similar repositories for CBook-150K
Users that are interested in CBook-150K are comparing it to the libraries listed below
Sorting:
- MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志…☆4,118Jan 31, 2026Updated 2 weeks ago
- pCLUE: 1000000+多任务提示学习数据集☆506Oct 4, 2022Updated 3 years ago
- Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料☆997Feb 6, 2026Updated last week
- A Multi-Turn Dialogue Corpus based on Alpaca Instructions☆177Jun 1, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,058Apr 14, 2024Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,283Oct 16, 2024Updated last year
- Multi-language Enhanced LLaMA☆303Apr 13, 2023Updated 2 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated last year
- ChatGLM-6B 指令学习|指令数据|Instruct☆655Apr 10, 2023Updated 2 years ago
- 语言模型中文认知能力分析☆235Sep 9, 2023Updated 2 years ago
- ☆173Apr 20, 2023Updated 2 years ago
- ☆772Jun 13, 2024Updated last year
- ☆313Apr 6, 2023Updated 2 years ago
- Instruction Tuning with GPT-4☆4,340Jun 11, 2023Updated 2 years ago
- Open Academic Research on Improving LLaMA to SOTA LLM☆1,611Aug 30, 2023Updated 2 years ago
- alpaca中文指令微调数据集☆397Mar 26, 2023Updated 2 years ago
- This repository open-sources our GEC system submitted by THU KELab (sz) in the CCL2023-CLTC Track 1: Multidimensional Chinese Learner Tex…☆15Nov 25, 2023Updated 2 years ago
- ⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡☆2,948Nov 26, 2023Updated 2 years ago
- [NIPS2023] RRHF & Wombat☆808Sep 22, 2023Updated 2 years ago
- We unified the interfaces of instruction-tuning data (e.g., CoT data), multiple LLMs and parameter-efficient methods (e.g., lora, p-tunin…☆2,799Dec 12, 2023Updated 2 years ago
- Datasets for Instruction Tuning of Large Language Models☆261Nov 30, 2023Updated 2 years ago
- Consider is a parser for the ThinkGear protocol used by NeuroSky devices (MindSet, BrainBand and others).☆16Apr 3, 2012Updated 13 years ago
- GAU-alpha-pytorch☆20May 11, 2022Updated 3 years ago
- An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.☆10Dec 3, 2024Updated last year
- TigerBot: A multi-language multi-task LLM☆2,262Dec 28, 2024Updated last year
- 人工精调的中文对话数据集和一段chatglm的微调代码☆1,195May 3, 2025Updated 9 months ago
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,924Dec 7, 2024Updated last year
- BLOOM 模型的指令微调☆24Jun 15, 2023Updated 2 years ago
- ☆54Apr 15, 2022Updated 3 years ago
- 万卷1.0多模态语料☆570Oct 20, 2023Updated 2 years ago
- Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]☆1,812Jul 27, 2025Updated 6 months ago
- Android TextMate Bundle☆17Mar 20, 2009Updated 16 years ago
- Citation Manager for OJS☆13Jun 4, 2024Updated last year
- A simple node.js wrapper for Stanford CoreNLP.☆10Aug 7, 2014Updated 11 years ago
- CROMER (CROss-document Main Events and entities Recognition), is a tool for cross-document coreference☆12Jan 14, 2015Updated 11 years ago
- Useful collection of webrat Textmate snippets meant for use with the RSpec Story and/or Cucumber bundles☆79Aug 7, 2009Updated 16 years ago
- ☆11Dec 10, 2022Updated 3 years ago
- Hello world demonstration for Weblate☆14Jan 20, 2026Updated 3 weeks ago
- A proselint linter for use with Phabricator's arc command line tool.☆17Jun 17, 2016Updated 9 years ago