drivetosouth / SafeDialBench-DatasetLinks
Official github repo for SafeDialBench, a comprehensive multi-turn dialogue benchmark to evaluate LLMs' safety.
☆34Updated 2 months ago
Alternatives and similar repositories for SafeDialBench-Dataset
Users that are interested in SafeDialBench-Dataset are comparing it to the libraries listed below
Sorting:
- A Framework of Continual Learning☆117Updated last month
- ☆15Updated last week
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆69Updated 2 months ago
- 关于LLM和Multimodal LLM的paper list☆42Updated last month
- ☆16Updated 2 months ago
- ☆134Updated 5 months ago
- ☆13Updated 2 years ago
- ☆49Updated 8 months ago
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆59Updated 6 months ago
- The code repository for "OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions"☆14Updated 5 months ago
- The code of LLaVO☆20Updated last year
- [NeurIPS 2023] Generalized Logit Adjustment☆38Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆360Updated 7 months ago
- Papers about Hallucination in Multi-Modal Large Language Models (MLLMs)☆94Updated 8 months ago
- Instruction Tuning in Continual Learning paradigm☆54Updated 6 months ago
- ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse☆50Updated last year
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆40Updated 8 months ago
- ☆118Updated 2 years ago
- ☆255Updated last month
- [ICLR 2025] "Noisy Test-Time Adaptation in Vision-Language Models"☆16Updated 5 months ago
- Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning☆18Updated last month
- ☆98Updated last year
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆68Updated 2 weeks ago
- Data and Code for Paper IS-Bench: Evaluating Interactive Safety of VLM-Driven Embodied Agents in Daily Household Tasks☆20Updated last week
- Official code for ICLR 2024 paper, "A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation"☆81Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(…☆295Updated 8 months ago
- A RLHF Infrastructure for Vision-Language Models☆180Updated 8 months ago
- [CVPR 2024] Tune-An-Ellipse: CLIP Has Potential to Find What You Want☆14Updated 7 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 3 months ago
- Awesome RL-based LLM Reasoning☆592Updated 3 weeks ago