(ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.01935
☆231Oct 27, 2025Updated 4 months ago
Alternatives and similar repositories for MARBLE
Users that are interested in MARBLE are comparing it to the libraries listed below
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- ☆16Jul 23, 2024Updated last year
- This repository contains code and datasets for our paper on the effects of document multiplicity while the context size is fixed in Retri…☆18Mar 13, 2025Updated 11 months ago
- pix2pix and Cycle GAN architectures for image style transfer☆13May 27, 2021Updated 4 years ago
- ☆47Sep 7, 2025Updated 6 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆25Sep 26, 2024Updated last year
- [NeurIPS'25] ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and R…☆32Sep 27, 2025Updated 5 months ago
- ☆46Jun 24, 2025Updated 8 months ago
- Preview Code for Continuum Paper☆43Jan 26, 2026Updated last month
- ☆68Jun 20, 2024Updated last year
- Generate videos using Temporal, Google Gemini, and Veo 2.☆16Jul 11, 2025Updated 7 months ago
- For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…☆11May 28, 2025Updated 9 months ago
- ☆99Jun 12, 2024Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Jun 28, 2024Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Mar 22, 2024Updated last year
- ☆11Apr 21, 2025Updated 10 months ago
- LMM for VQA, tcsvt version☆11Jul 19, 2024Updated last year
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆40Apr 15, 2025Updated 10 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆45Apr 3, 2025Updated 11 months ago
- [CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…☆32Nov 25, 2025Updated 3 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- Code for paper 'Accelerating Antimicrobial Peptide Discovery with Latent Sequence-Structure Model'☆13Mar 21, 2024Updated last year
- This repo contains all the codes for SEScore implementation☆15Mar 3, 2025Updated last year
- This project demonstrates deploying a secure, scalable Generative AI (GenAI) solution on Azure using a Retrieval-Augmented Generation (RA…☆18Feb 27, 2025Updated last year
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Jul 24, 2025Updated 7 months ago
- The official repo of continuous speculative decoding☆31Mar 28, 2025Updated 11 months ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Feb 25, 2025Updated last year
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆32May 20, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- [ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents☆26Feb 17, 2026Updated 3 weeks ago
- ☆27Jan 14, 2026Updated last month
- [ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆17Sep 17, 2025Updated 5 months ago
- Benchmarking data and script used for LLM multi-agent collaboration systems from AWS Bedrock Agents Science team.☆17Dec 10, 2024Updated last year
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Jul 1, 2024Updated last year
- [ACL 2019/AACL 2020] Second-Order Syntactic/Semantic Dependency Parsing With Mean Field Variational Inference (PyTorch)☆14Oct 22, 2020Updated 5 years ago
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Mar 11, 2025Updated 11 months ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆50Dec 25, 2025Updated 2 months ago
- Java/python library and validator for the AIDA Interchange Format (AIF). Originally based on isi-vista/gaia-interchange.☆21Jun 14, 2023Updated 2 years ago
- A python library to find differences between audio and transcriptions☆19Nov 14, 2023Updated 2 years ago