Kartik-3004 / facexbench
FaceXBench: Evaluating Multimodal LLMs on Face Understanding
☆14Updated 3 months ago
Alternatives and similar repositories for facexbench:
Users that are interested in facexbench are comparing it to the libraries listed below
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆23Updated last week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 9 months ago
- Official implementation for "Diffusion Instruction Tuning"☆22Updated 2 months ago
- ☆34Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆28Updated 2 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆33Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆58Updated 2 months ago
- [NeurIPS 2024] The official implementation of "Image Copy Detection for Diffusion Models"☆16Updated 7 months ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆47Updated 6 months ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆37Updated 3 months ago
- ☆52Updated 2 weeks ago
- ☆33Updated 2 months ago
- ☆40Updated 9 months ago
- This repository provides an improved LLamaGen Model, fine-tuned on 500,000 high-quality images, each accompanied by over 300 token prompt…☆30Updated 6 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆79Updated last month
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆50Updated 4 months ago
- ☆17Updated 6 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆59Updated 3 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆76Updated 7 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆35Updated 10 months ago
- [CVPR 2025 AI4CC Workshop] Official Implementation of HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editin…☆28Updated 5 months ago
- ☆43Updated 2 weeks ago
- [ICLR 2024] Contextualized Diffusion Models for Text-Guided Image and Video Generation☆67Updated 11 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 10 months ago
- The official repo of continuous speculative decoding☆26Updated last month
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆124Updated 8 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆59Updated last year
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆31Updated 8 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 6 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆20Updated 2 weeks ago