VIStA-H / GPT-4V_Social_Media
GPT-4V(ision) as A Social Media Analysis Engine
☆35Updated 4 months ago
Alternatives and similar repositories for GPT-4V_Social_Media
Users that are interested in GPT-4V_Social_Media are comparing it to the libraries listed below
Sorting:
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆86Updated last month
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆15Updated last month
- Official repo for StableLLAVA☆95Updated last year
- ☆30Updated 9 months ago
- Official implement of MIA-DPO☆57Updated 3 months ago
- VisRL: Intention-Driven Visual Perception via Reinforced Reasoning☆28Updated 2 months ago
- ☆35Updated 10 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆84Updated 8 months ago
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- 本项目用于Multimodal领域新手的学习路线,包括该领域的经典论文,项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知,能够自己进行的独立研究。☆17Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 8 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆71Updated 11 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆83Updated last month
- A collection of vision foundation models unifying understanding and generation.☆55Updated 4 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆18Updated 11 months ago
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- ☆25Updated last month
- ☆32Updated 3 months ago
- TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆34Updated last week
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆19Updated 7 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆45Updated 10 months ago
- Unifying Visual Understanding and Generation with Dual Visual Vocabularies 🌈☆45Updated last month
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆58Updated 3 weeks ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆57Updated 10 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆83Updated last year
- 【COLING 2025🔥】Code for the paper "Is Parameter Collision Hindering Continual Learning in LLMs?".☆33Updated 5 months ago
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆32Updated last month
- ☆21Updated 4 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆137Updated 6 months ago