ZwZ model family: SOTA fine-grained perception performace; ZoomBench: a new challenging perception benchmark
☆124Mar 9, 2026Updated last month
Alternatives and similar repositories for Zooming-without-Zooming
Users that are interested in Zooming-without-Zooming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch implementation of POEM (Out-of-distribution detection with posterior sampling), ICML 2022☆28May 6, 2023Updated 2 years ago
- Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…☆12Apr 4, 2025Updated last year
- [ACL 2026 Findings] "Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning"☆62Jan 28, 2026Updated 3 months ago
- Official Pytorch implementation of NeuralWalker (ICLR 2025)☆39Jun 25, 2025Updated 10 months ago
- ☆26Feb 13, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆23Aug 20, 2024Updated last year
- MLX Implementation of Recursive Reasoning with Tiny Networks☆78Oct 11, 2025Updated 6 months ago
- [ACL'25 Oral] Code for the paper "UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban…☆28Jul 15, 2025Updated 9 months ago
- ☆21Nov 27, 2025Updated 5 months ago
- Image Classification Tutorial: ConvNext--> 98.8% on CIFAR10 + 92.4% on CIFAR100; ResNet18 -- 95.6% on CIFAR10 + 79.1% on CIFAR100☆15Jun 2, 2025Updated 10 months ago
- Simple and Ideal Circuit Simulation☆13Dec 4, 2017Updated 8 years ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆147Apr 15, 2026Updated 2 weeks ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆88Mar 9, 2026Updated last month
- Visual Grounding with Multi-modal Conditional Adaptation (ACMMM 2024 Oral)☆26Jun 11, 2025Updated 10 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Scaling Long-Horizon LLM Agent via Context-Folding☆148Jan 26, 2026Updated 3 months ago
- 音乐类语料的意图识别填槽以及槽值纠错模型☆18Mar 24, 2023Updated 3 years ago
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆31Nov 13, 2025Updated 5 months ago
- Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]☆180Mar 30, 2026Updated 3 weeks ago
- [NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models☆57May 28, 2025Updated 11 months ago
- MultiModal Rag with Colpali, Milvus and VLM☆15Dec 22, 2024Updated last year
- Retrieval-Augmented Generation System for Cardiovascular Disease Consultation☆17Dec 31, 2024Updated last year
- LEMMA: Logical Engine for Multi-domain Mathematical Analysis☆28Feb 14, 2026Updated 2 months ago
- ☆18May 14, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A PyTorch implementation of a conditional Denoising Diffusion Probabilistic Model (DDPM) for multi-modal trajectory prediction. This proj…☆38Feb 20, 2026Updated 2 months ago
- ☆23Jan 9, 2026Updated 3 months ago
- SpeedVision is an AI-powered tool that detects and calculates vehicle speed from video footage using YOLO-based object detection and fram…☆10Sep 22, 2024Updated last year
- Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE☆29Sep 20, 2025Updated 7 months ago
- A tool to explore ideas generated from artificial intelligence chats.☆10Apr 3, 2023Updated 3 years ago
- Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models☆100Jan 14, 2026Updated 3 months ago
- Simple intermediate representation language for learning and research.☆20Mar 27, 2020Updated 6 years ago
- Image caption and manage tool for AI training☆11Jan 24, 2025Updated last year
- This is the official repository for paper: cross-modal information flow in multimodal large language models☆43May 21, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆15Jul 13, 2023Updated 2 years ago
- A replication of Google's VideoPoet model☆12Feb 18, 2024Updated 2 years ago
- This project allows students to demonstrate their coding skills before entering CodeU. Students will complete this JSON-lite object and J…☆10Apr 27, 2017Updated 9 years ago
- Source code of paper: Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning☆45Jun 24, 2025Updated 10 months ago
- ☆60Updated this week
- 基于深度学习的药品评论情感分析系统,可以自动分析药品评论的情感倾向(积极、中性、消极)。本项目采用 LSTM + BERT 词向量的混合架构,并提供了友好的 Web 界面。☆14Dec 24, 2024Updated last year
- REAP expert pruning for MoE LLMs on Apple Silicon via MLX☆55Mar 16, 2026Updated last month