轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
☆1,061Jun 16, 2025Updated 11 months ago
Alternatives and similar repositories for MiniMind-in-Depth
Users that are interested in MiniMind-in-Depth are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🎓从0开始训练一个大模型Minimind项目的超详细解析,包括但不限于用到的架构,算法,以及大模型面试经验☆906May 25, 2026Updated 2 weeks ago
- 🚀 [从零构建 LLM] 极简大模型训练原理与实践指南。包含 Transformer, Pretraining, SFT 核心代码与对照实验。 | A minimal, principle-first guide to understanding and building…☆123Jun 4, 2026Updated last week
- 🧠「大模型」2小时完全从0训练64M的小参数LLM!Train a 64M-parameter LLM from scratch in just 2h!☆51,407Jun 1, 2026Updated last week
- 一个基于 模型上下文协议/MCP 构建的智能医学文献分析工具。它旨在帮助科研人员、医学从业者和学生快速检索 PubMed 数据库,并利用大型语言模型 (LLM) 的能力对文献摘要进行智能分析和总结☆10May 18, 2025Updated last year
- 👀「大模型」2小时从0训练65M参数的视觉多模态VLM!Train a 65M-parameter VLM from scratch in just 2h!☆8,126May 19, 2026Updated 3 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 🚀 轻量视频🎥 大模型🤖☆23Apr 27, 2025Updated last year
- The official implementation of the paper "MLP Memory: A Retriever-Pretrained Memory for Large Language Models". (ICLR 2026)☆66Updated this week
- ☆11Sep 16, 2025Updated 8 months ago
- 本项目对Deepseek-R1-Distill-Qwen-7B进行心理咨询CoT数据的LoRA微调,以进一步提升Deepseek-R1-Distill-Qwen-7B在心理咨询领域的慢思考能力。☆12Mar 11, 2025Updated last year
- TAAC2025初赛第十四 名O_o队伍代码☆134Oct 27, 2025Updated 7 months ago
- I love reinforcement learning.☆12Jan 15, 2025Updated last year
- 主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题☆14,457Apr 30, 2025Updated last year
- 本项目旨在构建一套多场景下可 复用的辅助决策型智能 Agent 系统。通过提取用户输入的关键信息,结合历史数据进行智能匹配,系统可在教育路径、法律咨询、金融投资、心理健康、企业经营、供应链优化、危机应对、智能客服等多个领域提供个性化决策建议。系统采用统一的决策流程设计,具备高…☆27Mar 7, 2026Updated 3 months ago
- 北航《并行程序设计》Lab合集(竞速Rank1)☆31Feb 23, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- BUPT Joint Programme with QMUL☆20Dec 21, 2023Updated 2 years ago
- 《开源大模型食用指南》针对中国宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程☆30,837Jun 3, 2026Updated last week
- SAR Backprojection Implementations for the GOTCHA data set☆15Jan 28, 2024Updated 2 years ago
- ☆750May 27, 2026Updated 2 weeks ago
- The simplest Local Knowledge Base example based on Langchain and Chat-GLM☆13Jun 9, 2023Updated 3 years ago
- 开源许可证助手☆16Jun 20, 2025Updated 11 months ago
- 本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)☆24,452May 25, 2026Updated 2 weeks ago
- LaTeXDataHub is an open-source platform dedicated to the sharing and contribution of real-world LaTeX image datasets and their annotation…☆12Aug 13, 2024Updated last year
- MiniGPT-Pancreas: Multimodal Large language Model for Pancreas Cancer Classification and Detection☆13Sep 19, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆12Sep 29, 2019Updated 6 years ago
- [ACM MM 2025] Mobile U-ViT: Revisiting large kernel and U-shaped ViT for efficient medical image segmentation☆60Oct 29, 2025Updated 7 months ago
- Edge-oriented Point cloud Transformer for 3D Intracranial Aneurysm Segmentation. MICCAI22☆13Aug 18, 2022Updated 3 years ago
- 《Pattern Recognition and Machine Learning》阅读讨论班☆35May 20, 2019Updated 7 years ago
- Official repo for "StreamingVLA: Streaming Vision-Language-Action Model with Action Flow Matching and Adaptive Early Observation"☆26Apr 22, 2026Updated last month
- Highly interactive graph data visualization☆15Oct 13, 2021Updated 4 years ago
- The supplementary material for the paper "Fine-tuning Large Language Models to Improve Accuracy and Comprehensibility of Automated Code R…☆16Aug 12, 2024Updated last year
- 《大模型白盒子构建指南》:一个全手搓的Tiny-Universe☆4,902Feb 12, 2026Updated 4 months ago
- RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency☆19Oct 2, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- The Zaychik Power Controller server☆13Apr 13, 2024Updated 2 years ago
- 新手友好的基于 Qwen2 的生成式推荐系统,通过大模型理解用户偏好生成候选物品,融合 TF-IDF 关键词召回、热门物品召回构建多源策略,平衡相关性与多样性。采用 LightGBM 排序模型精准打分,内置召回率、NDCG 等评估指标量化效果。通过 Flask 封装 API…☆54Dec 6, 2025Updated 6 months ago
- Code for the paper: Rehearsal-free Continual Language Learning via Efficient Parameter Isolation☆13May 16, 2023Updated 3 years ago
- 31条指令MIPS多周期CPU,用来忽悠计组大作业。☆11Jul 2, 2021Updated 4 years ago
- 基于InternLm chat 7B大模型基座,构建一个Agent ,可以调用 MMYOLO 工具来完成图像内视觉任务☆11Oct 30, 2024Updated last year
- ☆22Dec 12, 2025Updated 6 months ago
- [KDD 2026] Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe☆33Aug 10, 2025Updated 10 months ago