Official repository of "Zero-Shot Character Identification and Speaker Prediction in Comics via Iterative Multimodal Fusion" (ACMMM 2024)
☆16Oct 31, 2024Updated last year
Alternatives and similar repositories for zeroshot-speaker-prediction
Users that are interested in zeroshot-speaker-prediction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository of Manga109Dialog (ICME 2024)☆29Aug 3, 2024Updated last year
- The code will come soon.☆16Sep 12, 2025Updated 8 months ago
- Multilingual Entity Linking model by BELA model☆12Jul 20, 2023Updated 2 years ago
- LVAS-Agent Code Base☆21Apr 15, 2025Updated last year
- ☆18Apr 7, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios☆11Mar 21, 2024Updated 2 years ago
- Accepted to ICLR 2025. MetaMetrics is a calibrated meta-metric designed to evaluate generation tasks across different modalities aligned …☆15Dec 30, 2024Updated last year
- MangaLMM – Try the official demo below☆43Nov 9, 2025Updated 7 months ago
- Code and data for the paper: DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator☆13Apr 28, 2024Updated 2 years ago
- This repository implements computer vision for real-time chessboard detection and piece recognition. Using OpenCV and Numpy, the system p…☆15Sep 24, 2024Updated last year
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated last year
- ☆15Sep 30, 2023Updated 2 years ago
- 北邮课程设计与大作业合集☆12Mar 25, 2024Updated 2 years ago
- [EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering☆18Oct 31, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Hands-On Tutorial on Building Multimodal RAG Systems☆14Apr 10, 2025Updated last year
- chainer v2 implementation of instance normalization☆11Aug 8, 2018Updated 7 years ago
- ☆16Feb 5, 2024Updated 2 years ago
- ☆11Sep 18, 2017Updated 8 years ago
- mindmap: Spatial Memory in Deep Feature Maps for 3D Action Policies☆52Oct 16, 2025Updated 7 months ago
- ☆19Sep 11, 2024Updated last year
- [CVPR 2025] DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding☆30Dec 18, 2025Updated 5 months ago
- Official implementation of "Real-SRGD: Enhancing Real-World Image Super-Resolution with Classifier-Free Guided Diffusion" [ACCV2024]☆19Dec 9, 2024Updated last year
- [ACM MM 2024] Pytorch Code for the paper "Robust Variational Contrastive Learning for Partially View-unaligned Clustering"☆16Feb 7, 2026Updated 4 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Official implementation for the paper "Self-Play Reinforcement Learning for Fast Image Retargeting"☆10Oct 5, 2020Updated 5 years ago
- ☆12Feb 17, 2017Updated 9 years ago
- 技術書のサポートページです☆10Aug 21, 2020Updated 5 years ago
- My school projects.☆11Jan 10, 2024Updated 2 years ago
- Source code of the TextLap model, a LLM for text-2-layout generation.☆18Oct 21, 2024Updated last year
- ☆28Aug 22, 2025Updated 9 months ago
- [ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing☆26Apr 1, 2025Updated last year
- Segmentation of text in manga images☆140Feb 6, 2021Updated 5 years ago
- Dual Fusion-Propagation Graph Neural Network for Multi-View Clustering☆17Sep 15, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Repository for the KVP10k dataset☆23Sep 18, 2025Updated 8 months ago
- Code for the ECCV 2020 paper: `Look here! A learning based approach to redirect visual attention'☆13Aug 19, 2020Updated 5 years ago
- thesis slides repository for cvpaper.challenge☆11Apr 25, 2019Updated 7 years ago
- Official implementation of URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding (AAAI 2026…☆43Feb 4, 2026Updated 4 months ago
- [Multimedia Systems] SiamHCC: a novel siamese network for quality evaluation of handwritten Chinese characters☆50Jul 8, 2025Updated 11 months ago
- ☆17Mar 24, 2025Updated last year
- ☆24Oct 13, 2024Updated last year