taogoddd / GPT-4V-API
Self-hosted GPT-4V api
☆29Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GPT-4V-API
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆85Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆46Updated 3 weeks ago
- Official Code of IdealGPT☆32Updated last year
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆26Updated 4 months ago
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆50Updated 6 months ago
- ☆14Updated 3 weeks ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆57Updated 5 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆30Updated 3 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆67Updated 4 months ago
- Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"☆40Updated last month
- DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models☆56Updated 2 weeks ago
- ☆45Updated last year
- ☆65Updated last year
- Paper collections of methods that using language to interact with environment, including interact with real world, simulated world or WWW…☆122Updated last year
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆64Updated last year
- ☆12Updated 6 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆26Updated 4 months ago
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…☆110Updated 3 weeks ago
- Official repository for paper "GTA: A Benchmark for General Tool Agents" (NeurIPS 2024 D&B Track)☆43Updated this week
- GUICourse: From General Vision Langauge Models to Versatile GUI Agents☆80Updated 3 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆48Updated last year
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆43Updated last week
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆16Updated 5 months ago
- ☆53Updated 2 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆100Updated 7 months ago
- Language Repository for Long Video Understanding☆28Updated 4 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆48Updated 6 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- The Official Code Repository for GUI-World.☆37Updated 3 months ago
- official repo for the paper "Learning From Mistakes Makes LLM Better Reasoner"☆53Updated 10 months ago