secsilm / chinese-tokens-in-tiktoken
Chinese tokens in tiktoken tokenizers.
☆28Updated 4 months ago
Related projects: ⓘ
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆30Updated 4 months ago
- ☆88Updated 2 months ago
- HF🤗每日简报机器人☆33Updated this week
- 我们是第一个完全可商用的角色大模型。☆31Updated last month
- ☆57Updated 3 weeks ago
- Token level visualization tools for large language models☆46Updated last month
- 2024年阿里全球数学竞赛AI赛道全球第2名项目(特工宇宙)☆45Updated 3 months ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆70Updated last year
- 顾名思义:手搓的RAG☆108Updated 6 months ago
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆92Updated 3 weeks ago
- Prompt 工程师利器,可同时比较多个 Prompts 在多个 LLM 模型上的效果☆95Updated last year
- 从头训练一个小参数量的视觉多模态VLM,预计2024年内开源☆23Updated this week
- 【逐条进行中】人为审核+加修改每一条的弱智吧精选问题QA数据集☆80Updated 2 months ago
- 😜 表情包视觉数据集,使用glm-4v、step-1v的图像解析能力标注。☆93Updated 4 months ago
- MoonPalace(月宫)是由 Moonshot AI 月之暗面提供的 API 调试工具。☆54Updated last week
- ☆32Updated 6 months ago
- ☆25Updated 4 months ago
- A lightweight script for processing HTML page to markdown format with support for code blocks☆70Updated 5 months ago
- Benchmark for LLM Reasoning & Understanding with Challenging Tasks from Real Users.☆103Updated this week
- ☆89Updated 3 months ago
- Search, organize, discover anything!☆44Updated 5 months ago
- ☆56Updated 8 months ago
- The official GitHub page for the survey paper "A Survey on Data Augmentation in Large Model Era"☆101Updated 2 months ago
- ☆75Updated 9 months ago
- 百度QA100万数据集☆48Updated 9 months ago
- A Toolkit for Running On-device Large Language Models (LLMs) in APP☆53Updated 2 months ago
- Evaluation for AI apps and agent☆35Updated 8 months ago
- backend for fastnlp MOSS project☆58Updated 2 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆121Updated 3 months ago
- rwkv finetuning☆35Updated 4 months ago