ChengShiest/Vision-Function-Layer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChengShiest/Vision-Function-Layer)

ChengShiest / Vision-Function-Layer

[NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".

☆32

Alternatives and similar repositories for Vision-Function-Layer

Users that are interested in Vision-Function-Layer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

SooLab / Part2Object
View on GitHub
[ECCV 2024] The official PyTorch implementation of the "Part2Object: Hierarchical Unsupervised 3D Instance Segmentation".
☆26Sep 12, 2024Updated last year
SooLab / EyeWO
View on GitHub
[NeurIPS2025] The official PyTorch implementation of the "Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video".
☆34Dec 25, 2025Updated 6 months ago
YuHengsss / SD-RPN
View on GitHub
[ICLR2026] Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception
☆17Jan 26, 2026Updated 5 months ago
zifuwan / ONLY
View on GitHub
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆51Jul 7, 2025Updated last year
TungChintao / SkiLa
View on GitHub
Official codes of "Sketch-in-Latents: Eliciting Unified Reasoning in MLLMs"
☆17Feb 15, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
WPR001 / Ego-ST
View on GitHub
☆16Sep 25, 2025Updated 9 months ago
SooLab / SimCIS
View on GitHub
[CVPR2025] Rethinking Query-based Transformer for Continual Image Segmentation
☆50Jul 16, 2025Updated last year
PandragonXIII / CIDER
View on GitHub
This is the official repository for Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models.
☆15Jan 16, 2025Updated last year
ali-vilab / Unison
View on GitHub
☆17Dec 11, 2025Updated 7 months ago
AfterJourney00 / mmd_to_smpl
View on GitHub
An automated workflow for composing, rendering, and retargeting MMD assets.
☆16Feb 23, 2026Updated 4 months ago
anakin-skywalker-Joseph / Folder
View on GitHub
Official Implementation of Paper FOLDER (ICCV2025) and Turbo (ECCV2024)
☆15Jun 27, 2025Updated last year
ChengShiest / Zip-Your-CLIP
View on GitHub
[ICLR 2024] The official implementation of Zip-Your-Clip
☆36Mar 14, 2024Updated 2 years ago
CR-Gjx / Img2Prompt
View on GitHub
Evaluation codes of "From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models".
☆17May 15, 2023Updated 3 years ago
XiaoyuXU1 / Representational_Analysis_Tools
View on GitHub
☆15May 23, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
BorisYang326 / OrderGOL
View on GitHub
[TOG 2025] Order Matters: Learning Element Ordering for Graphic Design Generation
☆24Aug 5, 2025Updated 11 months ago
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
yeshaokai / Calibrator-Domain-Adaptation
View on GitHub
Release code for light-weight calibrator: a separable component for unsupervised domain adaptation
☆13Jul 17, 2021Updated 5 years ago
kevinhsieh / non_iid_dml
View on GitHub
☆30Oct 22, 2020Updated 5 years ago
Social-AI-Studio / MATK
View on GitHub
Official repository for ACM Multimedia'23 paper "MATK: The Meme Analytical Tool Kit"
☆14May 29, 2024Updated 2 years ago
XLearning-SCU / 2026-AAAI-SCAN
View on GitHub
Official implementation of the paper “Endowing Vision-Language Models with System 2 Thinking for Fine-Grained Visual Recognition,” AAAI 2…
☆44Jan 30, 2026Updated 5 months ago
wangyu-ovo / MML
View on GitHub
Code for the paper "Jailbreak Large Vision-Language Models Through Multi-Modal Linkage"
☆35Dec 6, 2024Updated last year
luka-group / CoIN
View on GitHub
☆14Jun 11, 2024Updated 2 years ago
GeWu-Lab / MokA
View on GitHub
MokA: Multimodal Low-Rank Adaptation for MLLMs
☆91Dec 30, 2025Updated 6 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
cvlab-kaist / VIRAL
View on GitHub
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆163Sep 21, 2025Updated 10 months ago
InternLM / OVO-S-Bench
View on GitHub
An official implementation of "OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs"
☆47Jun 24, 2026Updated 3 weeks ago
yellow-binary-tree / MMDuet
View on GitHub
Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…
☆44Feb 5, 2025Updated last year
Lingyun0419 / CVPT
View on GitHub
Cross Visual Prompt Tuning [ICCV 2025]
☆13Aug 3, 2025Updated 11 months ago
ybb6 / laser
View on GitHub
☆34Apr 22, 2026Updated 2 months ago
UCSB-AI / DMLR
View on GitHub
[CVPR2026] Official codebase for the paper "Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space"
☆84May 12, 2026Updated 2 months ago
iLearn-Lab / ACL25-AdaReTaKe
View on GitHub
Official implementation of paper AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding
☆91Apr 21, 2026Updated 3 months ago
zjunlp / Deco
View on GitHub
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆146Sep 11, 2025Updated 10 months ago
google-deepmind / wyd-benchmark
View on GitHub
☆28Mar 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SooLab / CGFormer
View on GitHub
The official PyTorch implementation of the CVPR 2023 paper "Contrastive Grouping with Transformer for Referring Image Segmentation".
☆52Apr 17, 2024Updated 2 years ago
seilk / LocalizationHeads
View on GitHub
[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
☆79Aug 31, 2025Updated 10 months ago
wysnzzzz / DIT
View on GitHub
☆18Nov 15, 2024Updated last year
YuHengsss / Q-Zoom
View on GitHub
☆15Apr 15, 2026Updated 3 months ago
TonyStark1997 / OpenCV-Raspberry_Pi
View on GitHub
Learning notes for implementing OpenCV image processing in Python on the Raspberry Pi #在树莓派上用Python实现OpenCV图像处理的学习笔记
☆11Apr 11, 2019Updated 7 years ago
forwchen / LLaVA-MoLE
View on GitHub
☆10Mar 4, 2024Updated 2 years ago
PeterWang512 / AttributeByUnlearning
View on GitHub
Code for the paper "Data Attribution for Text-to-Image Models by Unlearning Synthesized Images."
☆17May 23, 2025Updated last year