avaxiao/TextRegion

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/avaxiao/TextRegion)

avaxiao / TextRegion

[TMLR 2025 J2C] TextRegion: Text-Aligned Region Tokens from Frozen Image-Text Models

☆54

Alternatives and similar repositories for TextRegion

Users that are interested in TextRegion are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

YuHengsss / Trident
View on GitHub
[ICCV2025] Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation
☆126Nov 22, 2025Updated 8 months ago
yuqunw / scene_diff
View on GitHub
☆22Jul 18, 2026Updated last week
bscho333 / ReVisiT
View on GitHub
[ACL 2026 Main] Revisit What You See: Revealing Visual Semantics in Vision Tokens to Guide LVLM Decoding
☆26Nov 21, 2025Updated 8 months ago
jessemelpolio / LMM_CL
View on GitHub
Codes for: How to Teach Large Multimodal Models New Skills?
☆30Oct 10, 2025Updated 9 months ago
vladan-stojnic / LPOSS
View on GitHub
Code for LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation (CVPR2025)
☆24Nov 8, 2025Updated 8 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sinahmr / NACLIP
View on GitHub
PyTorch Implementation of NACLIP in "Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation"
☆80Sep 23, 2024Updated last year
savya08 / REN
View on GitHub
Region Encoder Network
☆21Oct 2, 2025Updated 9 months ago
RADSeg-OVSS / RADSeg
View on GitHub
[CVPR'26 Findings] Source code for "RADSeg Unleashing Parameter and Compute Efficient Zero-Shot Open-Vocabulary Segmentation Using Agglom…
☆60May 31, 2026Updated last month
kaist-cvml / part-catseg
View on GitHub
[CVPR 2025] Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
☆30Nov 17, 2025Updated 8 months ago
AIGeeksGroup / PresentAgent-2
View on GitHub
PresentAgent-2: Towards Generalist Multimodal Presentation Agents
☆17Jun 5, 2026Updated last month
yvhangyang / ResCLIP
View on GitHub
Official implementation of ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
☆68Updated this week
nikosips / Universal-Image-Embeddings
View on GitHub
A large-scale benchmark for the evaluation of embeddings across a number of fine-grained and instance-level visual domains.
☆17Jun 14, 2024Updated 2 years ago
NVlabs / fova-depth
View on GitHub
☆18Dec 2, 2024Updated last year
LiYinqi / un2CLIP
View on GitHub
[NeurIPS'25] A work to improve CLIP's visual detail capturing ability by inverting the unCLIP generative model.
☆26Mar 19, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ysj9909 / StAR
View on GitHub
[ECCV 2026] StAR: Segment Anything Reasoner
☆25Apr 2, 2026Updated 3 months ago
agneet42 / revision
View on GitHub
[ECCV 2024] "REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models"
☆14Aug 6, 2024Updated last year
aim-uofa / COSINE
View on GitHub
[ICCV'25] Unified Open-World Segmentation with Multi-Modal Prompts
☆16Jun 16, 2026Updated last month
peterant330 / KUEA
View on GitHub
[ICML'25] Kernel-based Unsupervised Embedding Alignment for Enhanced Visual Representation in Vision-language Models
☆23Sep 7, 2025Updated 10 months ago
idstcv / InMaP
View on GitHub
PyTorch Implementation for InMaP
☆12Oct 28, 2023Updated 2 years ago
google-deepmind / tips
View on GitHub
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
☆576Jun 1, 2026Updated last month
MMMGBench / MMMG
View on GitHub
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning [NeurIPS 2025 Poster]
☆24Dec 10, 2025Updated 7 months ago
au-revoir / model-editing-ft
View on GitHub
☆13Sep 8, 2024Updated last year
Yibin-Lei / CSQE
View on GitHub
Implementation for EACL 2024 paper "Corpus-Steered Query Expansion with Large Language Models"
☆13Mar 19, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
nickjiang2378 / test-time-registers
View on GitHub
[NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"
☆184Sep 19, 2025Updated 10 months ago
CreaLabs / Enhanced-BGE-M3-with-CLP-and-MoE
View on GitHub
This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…
☆11Dec 27, 2024Updated last year
sirkosophia / DIP
View on GitHub
Official implementation of DIP: Unsupervised Dense In-Context Post-training of Visual Representations
☆46Sep 8, 2025Updated 10 months ago
TIGER-AI-Lab / Hierarchical-Reasoner
View on GitHub
Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning [ICLR26]
☆64Apr 11, 2026Updated 3 months ago
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
wrudman / NOTICE
View on GitHub
☆14Apr 10, 2025Updated last year
songw-zju / PixelThink
View on GitHub
The official implementation of "PixelThink: Towards Efficient Chain-of-Pixel Reasoning" (ICML 2026)
☆43Jul 4, 2026Updated 3 weeks ago
wangqinsi1 / Vision-Zero
View on GitHub
[ICLR 2026] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play.
☆136Feb 6, 2026Updated 5 months ago
zilunzhang / StreetCLIP-Repoduce
View on GitHub
☆13Jul 1, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
xfactlab / I0T
View on GitHub
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆12Jun 18, 2025Updated last year
RayFronts / RayFronts
View on GitHub
[IROS'25] Source code for "RayFronts: Open-Set Semantic Ray Frontiers for Online Scene Understanding and Exploration"
☆134Jun 19, 2026Updated last month
KaiyueSun98 / T2I-Personalization-with-AR
View on GitHub
☆47Apr 20, 2025Updated last year
UCDvision / gen2seg
View on GitHub
[ICLR 2026] Code for "gen2seg: Generative Models Enable Generalizable Instance Segmentation"
☆74Feb 9, 2026Updated 5 months ago
nnanhuang / Customize-it-3D
View on GitHub
[ICRA 2025] Official implementation of Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Pri…
☆75Jan 31, 2025Updated last year
ziqipang / MR-Video
View on GitHub
MR. Video: MapReduce is the Principle for Long Video Understanding
☆31Jun 18, 2026Updated last month
naver-ai / muco
View on GitHub
Official Pytorch implementation of MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model (CVPR 2026)
☆15Apr 16, 2026Updated 3 months ago