inFaaa/Multimodal-Roadmap-for-freshman

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/inFaaa/Multimodal-Roadmap-for-freshman)

inFaaa / Multimodal-Roadmap-for-freshman

本项目用于Multimodal领域新手的学习路线，包括该领域的经典论文，项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知，能够自己进行的独立研究。

☆51

Alternatives and similar repositories for Multimodal-Roadmap-for-freshman

Users that are interested in Multimodal-Roadmap-for-freshman are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AI-in-Health / M3FM
View on GitHub
[npj Digital Medicine] A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis
☆19Feb 6, 2025Updated last year
PKU-YuanGroup / GPT-as-Language-Tree
View on GitHub
GPT as a Monte Carlo Language Tree: A Probabilistic Perspective
☆46Jan 18, 2025Updated last year
IDEA-XL / RAPM
View on GitHub
Code for paper "Rethinking Text-based Protein Understanding: Retrieval or LLM?"
☆20Oct 7, 2025Updated 9 months ago
AI-in-Health / BioMedArena
View on GitHub
BioMedArena: a state-of-the-art biomedical harness for evaluating AI agents at scale - 100+ benchmarks, 70+ tools
☆22Jun 25, 2026Updated last month
AI-in-Health / Patient-Instructions
View on GitHub
[NeurIPS 2022] Code for "Retrieve, Reason, and Refine: Generating Accurate and Faithful Discharge/Patient Instructions"
☆36Jul 28, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
VIStA-H / GPT-4V_Social_Media
View on GitHub
GPT-4V(ision) as A Social Media Analysis Engine
☆39Dec 20, 2024Updated last year
IDEA-XL / ChemCoTBench
View on GitHub
LLM Reasoning Benchmark & Chain-of-Thoughts Dataset for Chemistry
☆55Oct 9, 2025Updated 9 months ago
KlingAIResearch / Uniaa
View on GitHub
Unified Multi-modal IAA Baseline and Benchmark
☆94Sep 27, 2024Updated last year
Lyu6PosHao / HME
View on GitHub
Here is the official code for Nature Communications "Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding".
☆23May 23, 2026Updated 2 months ago
JamesSand / UsefulCommands
View on GitHub
Lifelong Learning Note
☆16Jun 2, 2026Updated last month
AI-in-Health / PromptLLM
View on GitHub
Code for PromptNet
☆16Jan 29, 2025Updated last year
PKU-YuanGroup / ChronoMagic-Bench
View on GitHub
[NeurIPS 2024 D&B Spotlight🔥] ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation
☆213Apr 14, 2026Updated 3 months ago
PKU-YuanGroup / AsFT
View on GitHub
Code for the paper "AsFT: Anchoring Safety During LLM Fune-Tuning Within Narrow Safety Basin".
☆37Jul 10, 2025Updated last year
PKU-YuanGroup / N-LoRA
View on GitHub
【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".
☆38Dec 5, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
PKU-YuanGroup / PiCO
View on GitHub
[ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830
☆36Feb 16, 2025Updated last year
PKU-YuanGroup / TaxDiff
View on GitHub
The official code for "TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation"
☆75Aug 23, 2024Updated last year
yangbang18 / ZeroNLG
View on GitHub
(TPAMI'2024) ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
☆22Aug 8, 2024Updated last year
PKU-YuanGroup / EvaGaussians
View on GitHub
☆60Mar 16, 2025Updated last year
PKU-YuanGroup / Video-Bench
View on GitHub
A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!
☆140Dec 31, 2023Updated 2 years ago
zhangquanchen / VisRL
View on GitHub
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆47Nov 8, 2025Updated 8 months ago
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆103Nov 9, 2024Updated last year
lvkd84 / GraphFP
View on GitHub
Implementation of Fragment-based Pretraining and Finetuning on Molecular Graphs (NeurIPS 2023)
☆24Jun 10, 2024Updated 2 years ago
PKU-YuanGroup / TIDE
View on GitHub
Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
☆69Apr 30, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
julia-cherry / Teaser_official
View on GitHub
☆21Mar 4, 2025Updated last year
PKU-YuanGroup / Envision3D
View on GitHub
Envision3D: One Image to 3D with Anchor Views Interpolation
☆116May 16, 2024Updated 2 years ago
PKU-YuanGroup / UAE
View on GitHub
Official repository for the UAE paper, unified-GRPO, and unified-Bench
☆165Sep 12, 2025Updated 10 months ago
PKU-YuanGroup / WISE
View on GitHub
[ICML 2026🔥] WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
☆212Jun 26, 2026Updated last month
zjuruizhechen / Awesome-Video-Agent
View on GitHub
A collection of awesome think with videos papers.
☆100Dec 1, 2025Updated 7 months ago
inFaaa / Awesome-Personalized-Video-Creation
View on GitHub
📖 This is a repository for organizing papers, codes, and other resources related to personalized video generation and editing.
☆64Dec 9, 2025Updated 7 months ago
NEUIR / Uncode
View on GitHub
[ACL '26] Source code for paper "Empirical Analysis of Decoding Biases in Masked Diffusion Models"
☆44Jun 26, 2026Updated last month
PKU-YuanGroup / WF-VAE
View on GitHub
[CVPR 2025🔥] Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
☆205May 11, 2025Updated last year
CeeZh / SILVR
View on GitHub
Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"
☆19Jan 18, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
smallcjy / SCUT-Net-Auto-Login
View on GitHub
为SCUT的同学实现宿舍校园网的自动登录
☆10Nov 4, 2024Updated last year
StopInvolution / ChineseCheckers
View on GitHub
2022 中国人民大学程序设计Ⅱ荣誉课程大作业：跳棋
☆11Jun 30, 2022Updated 4 years ago
google-deepmind / geckonum_benchmark_t2i
View on GitHub
GeckoNum Benchmark for T2I Model Eval.
☆15Dec 5, 2024Updated last year
google-research-datasets / DaTaSeg-Objects365-Instance-Segmentation
View on GitHub
We release the DaTaSeg Objects365 Instance Segmentation Dataset introduced in the DaTaSeg paper, which can be used as an evaluation bench…
☆22Dec 9, 2023Updated 2 years ago
PKU-YuanGroup / Next-Patch-Prediction
View on GitHub
[AAAI26] Next Patch Prediction
☆129Jan 2, 2025Updated last year
CASIA-IVA-Lab / VideoNIAH
View on GitHub
VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs
☆57Mar 9, 2025Updated last year
rain305f / TIDA
View on GitHub
[NeurIPS 2023] Discover and Align Taxonomic Context Priors for Open-world Semi-Supervised Learning
☆17Apr 15, 2024Updated 2 years ago