yiyangzhang-hz / PPTLinks
☆25Updated 3 weeks ago
Alternatives and similar repositories for PPT
Users that are interested in PPT are comparing it to the libraries listed below
Sorting:
- [CVPR] MergeVQ: A Unified Framework for Visual Generation and Representation with Token Merging and Quantization☆42Updated 2 months ago
- Official implementation of the paper: RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction☆17Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆288Updated last week
- ✈️ [ICCV 2025] Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints☆75Updated 2 months ago
- [NeurIPS 2024] Official Code for the Paper "Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning"☆24Updated 5 months ago
- A Collection of Papers on Diffusion Language Models☆126Updated last week
- Doodling our way to AGI ✏️ 🖼️ 🧠☆103Updated 3 months ago
- [CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow☆139Updated 5 months ago
- 本项目用于Multimodal领域新手的学习路线,包括该领域的经典论文,项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知,能够自己进行的独立研究。☆25Updated last year
- Official implementation of UnifiedReward & [NeurIPS 2025] UnifiedReward-Think☆546Updated this week
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding☆88Updated 5 months ago
- [NeurIPS 2025 DB] OneIG-Bench is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models acro…☆69Updated this week
- Provide .bst files for NeurIPS latex template☆48Updated 5 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆206Updated last week
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆215Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆148Updated last month
- [NeurIPS 2024] The official implement of research paper "FreeLong : Training-Free Long Video Generation with SpectralBlend Temporal Atten…☆56Updated 2 months ago
- This repository provides a comprehensive library for parallel training and LoRA algorithm implementations, supporting multiple parallel s…☆48Updated last week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 8 months ago
- [NAACL 2024] LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-text Generation?☆42Updated last year
- [ICCV 2025] FonTS: Text Rendering with Typography and Style Controls☆27Updated 3 weeks ago
- A unified framework for controllable caption generation across images, videos, and audio. Supports multi-modal inputs and customizable ca…☆50Updated 2 months ago
- ☆55Updated 4 months ago
- ☆44Updated 3 months ago
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆47Updated 5 months ago
- Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding☆147Updated 2 months ago
- Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference☆12Updated 3 months ago
- An in-context learning research testbed☆20Updated 6 months ago
- Official implementation of Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning☆163Updated this week
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆209Updated 5 months ago