MrZilinXiao/AutoVER

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MrZilinXiao/AutoVER)

MrZilinXiao / AutoVER

[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.

☆14

Alternatives and similar repositories for AutoVER

Users that are interested in AutoVER are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

open-vision-language / oven
View on GitHub
☆47Aug 15, 2023Updated 2 years ago
edchengg / oven_eval
View on GitHub
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆44Jun 7, 2025Updated last year
aimagelab / DiCO
View on GitHub
[BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
☆20Sep 11, 2024Updated last year
aimagelab / HySAC
View on GitHub
Hyperbolic Safety-Aware Vision-Language Models. CVPR 2025
☆31Apr 8, 2025Updated last year
phuselab / tppgaze
View on GitHub
☆17Feb 20, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
VLR-CVC / vlm-training
View on GitHub
large scale pre-training VLMs
☆25Jul 6, 2026Updated 2 weeks ago
aimagelab / ReflectiVA
View on GitHub
[CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
☆56Jul 14, 2025Updated last year
edchengg / infoseek_eval
View on GitHub
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆26May 30, 2024Updated 2 years ago
aimagelab / COGT
View on GitHub
[ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding
☆10Apr 15, 2025Updated last year
aimagelab / ScanDiff
View on GitHub
This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …
☆27May 13, 2026Updated 2 months ago
aimagelab / CoDE
View on GitHub
[ECCV'24] Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
☆52Jul 2, 2025Updated last year
TIGER-AI-Lab / UniIR
View on GitHub
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆183Oct 1, 2024Updated last year
luomancs / ReMuQ
View on GitHub
a multimodal retrieval dataset
☆25Jul 8, 2023Updated 3 years ago
HITsz-TMG / GEMEL
View on GitHub
Official implementation of our LREC-COLING 2024 paper "Generative Multimodal Entity Linking".
☆36Feb 27, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
crux82 / msr-vtt-it
View on GitHub
A large scale dataset for Video Captioning in Italian
☆13May 16, 2023Updated 3 years ago
open-vision-language / infoseek
View on GitHub
☆78Oct 27, 2023Updated 2 years ago
aimagelab / awesome-human-visual-attention
View on GitHub
This repository contains a curated list of research papers and resources focusing on saliency and scanpath prediction, human attention, h…
☆66May 9, 2025Updated last year
aimagelab / ReT
View on GitHub
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
☆37Sep 12, 2025Updated 10 months ago
nikhilchandak / answer-matching
View on GitHub
Code for 'Answer Matching Outperforms Multiple Choice for Language Model Evaluation' paper
☆18Jul 4, 2025Updated last year
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
ethanm88 / llm-access-control
View on GitHub
Official Repository for Can Language Models be Instructed to Protect Personal Information?
☆14Oct 8, 2023Updated 2 years ago
aimagelab / LLaVA-MORE
View on GitHub
[ICCVW 25] LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
☆160Aug 8, 2025Updated 11 months ago
aimagelab / safe-clip
View on GitHub
Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models. ECCV 2024
☆67Aug 10, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
NP-NET-research / Recursive-Semi-Markov-Model
View on GitHub
Source code for "N-ary Constituent Tree Parsing with Recursive Semi-Markov Model" published at ACL 2021
☆10May 27, 2021Updated 5 years ago
luka-group / vlm-knowledge-conflict
View on GitHub
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆54Oct 19, 2024Updated last year
Nithin-Holla / MetaWSD
View on GitHub
Repository containing code for the paper "Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation", publi…
☆12Nov 12, 2020Updated 5 years ago
Manuel030 / alpaca-opt
View on GitHub
Yet another LLM
☆10Apr 6, 2023Updated 3 years ago
NP-NET-research / wdel
View on GitHub
WDEL是一个基于Wikidata知识库的实体链接系统。
☆11Feb 12, 2025Updated last year
DataArcTech / RagVL
View on GitHub
Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …
☆92Nov 15, 2024Updated last year
Qinyu-Allen-Zhao / LVLM-LP
View on GitHub
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
☆43Nov 1, 2024Updated last year
Shimorina / relation-extraction-db-wikidata
View on GitHub
☆11Jul 17, 2022Updated 4 years ago
tmlabonte / last-layer-retraining
View on GitHub
Official codebase for the NeurIPS 2023 paper: Towards Last-layer Retraining for Group Robustness with Fewer Annotations. https://arxiv.or…
☆12May 15, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
uakarsh / docformerv2
View on GitHub
This repo consists of my implementation of DocFormerV2
☆12Mar 31, 2024Updated 2 years ago
LoieSun / Auto-ACD
View on GitHub
code for A Large-scale Dataset for Audio-Language Representation Learning
☆14Sep 18, 2024Updated last year
mi92 / reverse-image-rag
View on GitHub
☆15Jul 8, 2024Updated 2 years ago
zhang-yu-wei / InBedder
View on GitHub
[ACL 2024] Source code for InBedder, an instruction-following text embedder
☆31Oct 11, 2024Updated last year
ai4eu / tutorials
View on GitHub
Container Specification, Tutorials and Examples for the AI4EU Experiments docker/grpc format for models
☆13Jul 10, 2022Updated 4 years ago
EIT-NLP / Layer_Select_Fuse_for_MLLM
View on GitHub
[CVPR2025] Official implementation of the paper "Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practi…
☆49Oct 29, 2025Updated 8 months ago
codezakh / SelTDA
View on GitHub
[CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
☆17May 14, 2024Updated 2 years ago