SALT-NLP / PopupAttack
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
☆24Updated 3 weeks ago
Alternatives and similar repositories for PopupAttack:
Users that are interested in PopupAttack are comparing it to the libraries listed below
- Training and Benchmarking LLMs for Code Preference.☆28Updated 2 months ago
- ☆17Updated 2 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆67Updated 2 months ago
- ☆23Updated last month
- ☆22Updated 2 months ago
- ☆21Updated 6 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆66Updated 2 weeks ago
- ☆48Updated last month
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆46Updated last year
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated 3 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆47Updated 3 months ago
- ☆52Updated 2 weeks ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆26Updated 7 months ago
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆64Updated last year
- ☆18Updated 3 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 3 months ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆16Updated last month
- ☆36Updated last year
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆41Updated 5 months ago
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆19Updated 3 months ago
- Code for "Universal Adversarial Triggers Are Not Universal."☆16Updated 8 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆54Updated last month
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆18Updated last month
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆40Updated last month
- Web-grounded natural language instructions☆14Updated last month
- Codebase for decoding compressed trust.☆22Updated 8 months ago
- This is the repo for our paper "Mr-Ben: A Comprehensive Meta-Reasoning Benchmark for Large Language Models"☆44Updated 2 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆26Updated 6 months ago
- Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…☆36Updated 6 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆48Updated last year