An Arena-style Automated Evaluation Benchmark for Detailed Captioning
☆59Jun 1, 2025Updated last year
Alternatives and similar repositories for CapArena
Users that are interested in CapArena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"☆27Jun 7, 2026Updated 3 weeks ago
- ☆13Dec 9, 2024Updated last year
- Official Code for "Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning" (ICLR 2025)☆14Mar 6, 2025Updated last year
- [ACL 2025] AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant☆46Dec 19, 2024Updated last year
- ☆40Jan 23, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A Self-Training Framework for Vision-Language Reasoning☆89Jan 23, 2025Updated last year
- ☆22May 3, 2025Updated last year
- What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness☆28May 16, 2025Updated last year
- ☆49May 14, 2026Updated last month
- ☆19Nov 3, 2025Updated 8 months ago
- CLAIR: A (surprisingly) simple semantic text metric with large language models.☆22Jan 28, 2024Updated 2 years ago
- Unleashing Reasoning in Medical Large Language Models☆12Mar 19, 2025Updated last year
- ☆75Dec 6, 2024Updated last year
- [ACL 2024] The project of Symbol-LLM☆59Jul 10, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ICLR 2026] Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"☆129Feb 2, 2026Updated 5 months ago
- The model, data and code for the visual GUI Agent SeeClick☆486Jul 13, 2025Updated 11 months ago
- [IJCAI 2025] Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives☆35Nov 25, 2025Updated 7 months ago
- NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation☆13May 24, 2025Updated last year
- [ACL 2025] Code and data for OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis☆188Oct 8, 2025Updated 8 months ago
- Neural Code Intelligence Survey 2024-25; Reading lists and resources☆282Jul 24, 2025Updated 11 months ago
- Official repository of PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning☆58Oct 16, 2025Updated 8 months ago
- [ICLR 2026] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence☆79May 9, 2026Updated last month
- [ACL 2025] An inference-time decoding strategy with adaptive foresight sampling☆107May 18, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- [ACL2023, Findings] Source codes for the paper "Werewolf Among Us: Multimodal Resources for Modeling Persuasion Behaviors in Social Deduc…☆16Feb 22, 2025Updated last year
- ☆12Aug 8, 2024Updated last year
- the official repo for EMNLP 2024 (main) paper "EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimo…☆21Apr 9, 2025Updated last year
- ☆115Jan 8, 2025Updated last year
- Easy control for Key-Value Constrained Generative LLM Inference(https://arxiv.org/abs/2402.06262)☆62Feb 13, 2024Updated 2 years ago
- 使用torch.distributed实现DP/TP/PP☆15Dec 28, 2023Updated 2 years ago
- [EMNLP 2024 Findings] SEA is an automated paper review framework capable of generating comprehensive and high-quality review feedback wit…☆90Jan 18, 2026Updated 5 months ago
- [NeurIPS'25] GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents☆409Apr 13, 2026Updated 2 months ago
- [ICASSP 2025 Oral] The official implementation of paper "TextureDiffusion: Target Prompt Disentangled Editing for Various Texture Transfe…☆16Mar 13, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [MICCAI 2024] Embracing Massive Medical Data☆20Jul 5, 2024Updated last year
- [Pattern Recognition] The implementation of MoCA☆12Apr 1, 2023Updated 3 years ago
- Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals; ACL 2024☆13May 24, 2024Updated 2 years ago
- self-adaptive in-context learning☆45May 5, 2023Updated 3 years ago
- [NeurIPS 2025 Spotlight] FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities☆76Dec 21, 2025Updated 6 months ago
- [ICCV 2025] Prompt-A-Video☆24Feb 2, 2025Updated last year
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆160Sep 27, 2024Updated last year