ArtmeScienceLab / FonTSLinks
[ICCV 2025] FonTS: Text Rendering with Typography and Style Controls
☆29Updated last month
Alternatives and similar repositories for FonTS
Users that are interested in FonTS are comparing it to the libraries listed below
Sorting:
- Official implementation of MC-LLaVA.☆140Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆57Updated 3 months ago
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆121Updated last week
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆307Updated 2 weeks ago
- A python script for downloading huggingface datasets and models.☆20Updated 6 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆168Updated 2 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆182Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆218Updated last month
- Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"☆46Updated 3 months ago
- An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆87Updated 2 weeks ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆57Updated 3 months ago
- Official code for DeepSound-V1☆12Updated 4 months ago
- 🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training☆52Updated this week
- ☆22Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 5 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆81Updated last month
- Survey: https://arxiv.org/pdf/2507.20198☆157Updated last month
- Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)☆51Updated 6 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆103Updated 4 months ago
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆25Updated 4 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆155Updated 7 months ago
- ☆29Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆71Updated 2 months ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆31Updated 6 months ago
- [CVPR 2025] 🔥 Official impl. of "TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation".☆386Updated 2 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆152Updated 2 weeks ago
- ☆52Updated last month
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆75Updated 3 weeks ago
- [NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning☆25Updated 4 months ago
- Doodling our way to AGI ✏️ 🖼️ 🧠☆105Updated 4 months ago