ArtmeScienceLab / FonTSLinks
[ICCV 2025] FonTS: Text Rendering with Typography and Style Controls
☆33Updated 2 weeks ago
Alternatives and similar repositories for FonTS
Users that are interested in FonTS are comparing it to the libraries listed below
Sorting:
- Official implementation of MC-LLaVA.☆139Updated last week
- A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating…☆123Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆57Updated 4 months ago
- We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that S…☆182Updated last week
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".☆69Updated 4 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆328Updated last month
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 6 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆231Updated 3 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆191Updated 3 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆71Updated this week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆93Updated 2 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆155Updated 8 months ago
- Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potential in Unifie…☆310Updated last month
- A python script for downloading huggingface datasets and models.☆20Updated 7 months ago
- ☆30Updated 11 months ago
- Interleaving Reasoning: Next-Generation Reasoning Systems for AGI☆204Updated last month
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆161Updated 2 weeks ago
- ☆116Updated last week
- An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆100Updated last month
- ☆56Updated 3 months ago
- UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation☆111Updated this week
- The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"☆96Updated this week
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆33Updated 7 months ago
- TStar is a unified temporal search framework for long-form video question answering☆71Updated 2 months ago
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆25Updated 5 months ago
- Official code for DeepSound-V1☆13Updated 6 months ago
- Official codebase for the paper Latent Visual Reasoning☆37Updated last month
- ☆278Updated last month
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆181Updated last month
- Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆79Updated 3 months ago