JJJYmmm / Multimodal-RoPEsView external linksLinks
Official implement of paper "Revisiting Multimodal Positional Encoding in Vision–Language Models"
☆63Dec 9, 2025Updated 2 months ago
Alternatives and similar repositories for Multimodal-RoPEs
Users that are interested in Multimodal-RoPEs are comparing it to the libraries listed below
Sorting:
- Block-Recurrent Dynamics in ViTs 🦖☆30Dec 24, 2025Updated last month
- [CVPR 2022 Oral] Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations☆13Jul 14, 2022Updated 3 years ago
- ☆28Mar 4, 2025Updated 11 months ago
- Code of our paper "A Unified Agentic Framework for Evaluating Conditional Image Generation".☆30Jul 22, 2025Updated 6 months ago
- Official implementation of 'Out-of-domain GAN inversion via Invertibility Decomposition for Photo-Realistic Human Face Manipulation'☆23Feb 29, 2024Updated last year
- [ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation☆33Aug 18, 2025Updated 5 months ago
- Official Repo for Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation☆30Mar 29, 2024Updated last year
- MMHead: Towards Fine-grained Multi-modal 3D Facial Animation (ACM MM 2024)☆34Feb 1, 2026Updated 2 weeks ago
- [NIPS 25'] Evaluation code of paper "KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models"☆40Oct 19, 2025Updated 3 months ago
- ☆51Aug 22, 2025Updated 5 months ago
- ☆21Dec 14, 2025Updated 2 months ago
- A SapientML plugin of SapientMLGenerator☆11Dec 23, 2025Updated last month
- ☆34Jun 18, 2024Updated last year
- Workable training script for ControlNet tile☆35May 2, 2024Updated last year
- EdgeCortix maintained and extended fork of Apache TVM compiler stack utilized by MERA framework. TVM is an open deep learning compiler st…☆11Dec 22, 2023Updated 2 years ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆46Nov 17, 2024Updated last year
- 📷 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!☆116Feb 5, 2026Updated last week
- Group-Group Loss Based Global-Regional Feature Learning for Vehicle Re-Identification☆12May 10, 2022Updated 3 years ago
- SR-DiT Speedrunning ImageNet Diffusion☆123Dec 31, 2025Updated last month
- AI-ML-NLP Task Group☆13Aug 10, 2023Updated 2 years ago
- ☆10Sep 24, 2024Updated last year
- ☆43Dec 1, 2025Updated 2 months ago
- [ISBI 2024] Official PyTorch implementation of Towards Cross-Domain Single Blood Cell Image Classification via Large-Scale LoRA-based Seg…☆11Aug 12, 2024Updated last year
- Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO☆92Dec 1, 2025Updated 2 months ago
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆23Dec 4, 2025Updated 2 months ago
- ☆41Nov 12, 2025Updated 3 months ago
- ☆11Aug 31, 2024Updated last year
- ☆14Dec 2, 2025Updated 2 months ago
- ☆39Jan 16, 2026Updated 3 weeks ago
- gradio bbox labeling tools☆11May 12, 2023Updated 2 years ago
- Mender over-the-air software updater client for microcontrollers (MCUs).☆17Updated this week
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆24Oct 19, 2025Updated 3 months ago
- Support for zScale on Spartan6 FPGAs☆15Aug 3, 2015Updated 10 years ago
- [NeurIPS 2024] Data exporter for SS3DM: Benchmarking Street-View Surface Reconstruction with a Synthetic 3D Mesh Dataset☆16Nov 8, 2024Updated last year
- UFT: Unifying Supervised and Reinforcement Fine-Tuning☆24Jun 30, 2025Updated 7 months ago
- ☆34Oct 29, 2025Updated 3 months ago
- ☆11May 9, 2023Updated 2 years ago
- ☆13Sep 2, 2023Updated 2 years ago
- This repository provides the code for the methods and experiments presented in our paper 'Dual-stream Class-adaptive Network for Semi-sup…☆11Feb 29, 2024Updated last year