Xinyi-0724 / SmartHome-Bench-LLMLinks

SmartHome-Bench: A Comprehensive Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal Foundation Models

☆19

Alternatives and similar repositories for SmartHome-Bench-LLM

Users that are interested in SmartHome-Bench-LLM are comparing it to the libraries listed below

Sorting:

chancharikmitra / CCoT
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆137Updated last year
pkunlp-icler / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆50Updated 3 months ago
zjukg / Structure-CLIP
[Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations
☆150Updated last year
deepcs233 / Visual-CoT
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆383Updated 9 months ago
linkangheng / PR1
Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning
☆262Updated 3 months ago
WillDreamer / Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
☆88Updated last year
Chenyu-Wang567 / MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
☆134Updated last week
tanhuajie / Reason-RFT
[NeurIPS 2025]⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
☆218Updated 2 weeks ago
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆167Updated 3 months ago
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
☆53Updated 6 months ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
A Survey on Benchmarks of Multimodal Large Language Models
☆141Updated 3 months ago
OpenGVLab / LAMM
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆316Updated last year
Liuziyu77 / RAR
The official implementation of RAR
☆92Updated last year
zhyang2226 / OPA-DPO
[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
☆82Updated 2 weeks ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
yuyq96 / R1-Vision
R1-Vision: Let's first take a look at the image
☆48Updated 8 months ago
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆133Updated 7 months ago
OpenGVLab / MM-NIAH
[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…
☆115Updated 10 months ago
minglllli / CLS-RL
[NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning
☆68Updated last month
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆193Updated 3 months ago
PRIS-CV / GRPO-for-Llava
GRPO Algorithm for Llava Architecture (Based on Verl)
☆41Updated 5 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆84Updated last year
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆94Updated 9 months ago
FuxiaoLiu / LRV-Instruction
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆287Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated last year
zjysteven / VLM-Visualizer
Visualizing the attention of vision-language models
☆240Updated 7 months ago
JiuTian-VL / JiuTian-LION
[CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
☆153Updated last month
winycg / CLIP-KD
[CVPR-2024] Official implementations of CLIP-KD: An Empirical Study of CLIP Model Distillation
☆129Updated last month
LightChen233 / M3CoT
☆82Updated last year
TempleX98 / MoVA
[NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context
☆166Updated last year