[ICLR 2026] Empowering Small VLMs to Think with Dynamic Memorization and Exploration
☆18Mar 18, 2026Updated 2 months ago
Alternatives and similar repositories for DyME
Users that are interested in DyME are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2026] STAMP: Better, Stronger, Faster: Tackling the Trilemma in MLLM-based Segmentation with Simultaneous Textual Mask Prediction☆39Feb 21, 2026Updated 3 months ago
- [CVPR 2025] PyTorch implementation of Diff-II☆27Feb 27, 2025Updated last year
- ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation☆29May 27, 2025Updated last year
- [ICML 2026] Elastic Diffusion Transformer: Accelerating SOTA generation models (e.g., Qwen-Image, Hunyuan3d ) through adaptive computatio…☆44May 1, 2026Updated last month
- Multi-modal categorization of Age-related Macular Degeneration (4 classes: normal, dry AMD, pcv, wet AMD)☆31Apr 8, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆56Mar 13, 2026Updated 3 months ago
- A block pruning framework for LLMs.☆28May 17, 2025Updated last year
- Self-collected data for Masked Face recognition paper (300+ different participants)☆12Jul 13, 2023Updated 2 years ago
- OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning☆29May 24, 2025Updated last year
- Rui Qian, Xin Yin, Chuanhang Deng, et al.: UGround: Towards Unified Visual Grounding with Unrolled Transformers (ICML 2026)☆26Jun 5, 2026Updated last week
- Standardized Multi-Channel Dataset for Glaucoma (SMDG-19) is a collection and standardization of 19 public full-fundus glaucoma images an…☆21Apr 23, 2023Updated 3 years ago
- DQA: a comprehensive database Q&A benchmark☆32Jan 2, 2025Updated last year
- My implement of InstantBooth☆13Sep 11, 2023Updated 2 years ago
- Streaming Video Diffusion: Online Video Editing with Diffusion Models☆17Jun 3, 2024Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- LayoutDiT: Exploring Content-Graphic Balance in Layout Generation with Diffusion Transformer☆49Jan 6, 2026Updated 5 months ago
- Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021☆30Mar 30, 2021Updated 5 years ago
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆36Feb 28, 2026Updated 3 months ago
- [CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced eval…☆32Apr 16, 2025Updated last year
- ☆35Feb 10, 2023Updated 3 years ago
- [ICCV 2023] Subclass-balancing contrastive learning for long-tailed recognition☆18Oct 30, 2023Updated 2 years ago
- Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]☆192Mar 30, 2026Updated 2 months ago
- ACL24☆11Jun 7, 2024Updated 2 years ago
- [ICCV 2023] GeoFormer for Homography Estimation☆35Dec 25, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆31Jan 18, 2026Updated 4 months ago
- [IJCAI 2023] CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization.☆10Nov 3, 2023Updated 2 years ago
- ☆16Apr 11, 2026Updated 2 months ago
- Deeplot 聊天即绘图☆21Mar 30, 2025Updated last year
- Visual Instruction Tuning for Qwen2 Base Model☆44Jun 29, 2024Updated last year
- Official PyTorch implementation for "Where You Edit is What You Get: Text-Guided Image Editing with Region-Based Attention" (Pattern Reco…☆10Oct 1, 2024Updated last year
- ☆12Dec 6, 2024Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆21Jul 10, 2025Updated 11 months ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆37Jul 3, 2025Updated 11 months ago
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago
- The official implementation for SETA (TIP 2024).☆11Feb 17, 2025Updated last year
- Is the medical segmentation problem solved-Survey☆25Aug 29, 2025Updated 9 months ago
- ☆15Mar 30, 2025Updated last year
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 11 months ago
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆13Aug 22, 2025Updated 9 months ago