m&ms: A Benchmark to Evaluate Tool-Use for multi-step multi-modal tasks
☆46Sep 26, 2024Updated last year
Alternatives and similar repositories for mnms
Users that are interested in mnms are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Nov 27, 2024Updated last year
- A instruction data generation system for multimodal language models.☆37Jan 31, 2025Updated last year
- ☆69Jun 2, 2026Updated last week
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆92Feb 13, 2024Updated 2 years ago
- EcoAssistant: using LLM assistant more affordably and accurately☆133Jun 30, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for the paper "Taxonomy Completion via Triplet Matching Network" AAAI 2021☆29Apr 8, 2021Updated 5 years ago
- [ICLR'24] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆114Mar 21, 2024Updated 2 years ago
- ☆36Feb 5, 2024Updated 2 years ago
- [CVPR 2026] An accurate and dense-annotated synthetic dataset for training SOTA detectors / segmentors / Grounding-VLMs.☆47Feb 23, 2026Updated 3 months ago
- [ACL 2024] On the Multi-turn Instruction Following for Conversational Web Agents☆17Oct 12, 2024Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Aug 7, 2025Updated 10 months ago
- Web-grounded natural language instructions☆18Nov 25, 2024Updated last year
- ☆29Apr 30, 2024Updated 2 years ago
- ☆16Apr 10, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Demonstrating the BadAss issue.☆17May 19, 2025Updated last year
- A Python toolkit for the OmniLabel benchmark providing code for evaluation and visualization☆23Feb 1, 2025Updated last year
- [NeurIPS 2021] WRENCH: Weak supeRvision bENCHmark☆228Feb 13, 2024Updated 2 years ago
- ☆21Oct 10, 2023Updated 2 years ago
- Code for NeurIPS 2024 Paper - Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass☆21Aug 22, 2024Updated last year
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆35Oct 16, 2024Updated last year
- PyTorch code for Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles (DANCE)☆23Nov 29, 2022Updated 3 years ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆40Jul 29, 2023Updated 2 years ago
- Python codes for mathematical modeling.☆12Sep 5, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Support finetuning GLM4v with zero2☆16Jun 29, 2024Updated last year
- Regularly Truncated M-estimators for Learning with Noisy Labels☆12Apr 24, 2024Updated 2 years ago
- A makeshift python program which relies on nltk and Stanford Core NLP models to expand common contractions in the english language.☆10Nov 8, 2017Updated 8 years ago
- ☆15May 6, 2021Updated 5 years ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆68Apr 3, 2026Updated 2 months ago
- ☆54Sep 26, 2025Updated 8 months ago
- List of NLP Datasets☆10Mar 12, 2019Updated 7 years ago
- Investigating Cultural Alignment of Large Language Models☆13Aug 14, 2024Updated last year
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆306Apr 3, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Supercharge huggingface transformers with model parallelism.☆78Jul 23, 2025Updated 10 months ago
- This is an example of creating an AI agent with flowchart☆12Jul 22, 2024Updated last year
- ☆319Mar 26, 2024Updated 2 years ago
- Community Regularization of Visually Grounded Dialog https://arxiv.org/abs/1808.04359☆15May 16, 2019Updated 7 years ago
- Benchmarking Mobile Device Control Agents across Diverse Configurations (ICLR 2024 workshop GenAI4DM spotlight presentation; CoLLAs 2025)☆35Jul 21, 2025Updated 10 months ago
- ☆32Feb 8, 2024Updated 2 years ago
- Weighted Training for Cross-Task Learning☆15Feb 12, 2023Updated 3 years ago