UCSC-VLAA/STAR-1

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UCSC-VLAA/STAR-1)

UCSC-VLAA / STAR-1

[AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data

☆38

Alternatives and similar repositories for STAR-1

Users that are interested in STAR-1 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UCSC-VLAA / CIK-Bench
View on GitHub
Official repository for Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw
☆69May 2, 2026Updated 2 months ago
fangjf1 / OpenSafeMLRM
View on GitHub
The first toolkit for MLRM safety evaluation, providing unified interface for mainstream models, datasets, and jailbreaking methods!
☆15Apr 8, 2025Updated last year
UCSC-VLAA / AttnGCG-attack
View on GitHub
[TMLR 2025] Official implementation of AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
☆27Jun 17, 2025Updated last year
UCSC-VLAA / EarthWhere
View on GitHub
☆16Nov 15, 2025Updated 8 months ago
UCSC-VLAA / m1
View on GitHub
[ML4H'25] m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
☆51Dec 21, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
haojinw0027 / MedFrameQA
View on GitHub
MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning
☆18Jun 6, 2025Updated last year
WangCheng0116 / Awesome-LRMs-Safety
View on GitHub
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆90Aug 25, 2025Updated 10 months ago
hu-zijing / AsynDM
View on GitHub
[ICLR 26] Asynchronous diffusion models allocate individual pixels with varying timestep schedules, yielding improved text-to-image align…
☆19Oct 7, 2025Updated 9 months ago
UCSC-VLAA / VLAA-Thinking
View on GitHub
[TMLR 25] SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
☆148Oct 10, 2025Updated 9 months ago
UCSC-VLAA / MedVLThinker
View on GitHub
[ML4H'25] MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
☆59Dec 21, 2025Updated 7 months ago
UCSC-VLAA / EpiFoundation
View on GitHub
Pytorch implementation of EpiFoundation
☆26Feb 25, 2025Updated last year
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
UCSC-VLAA / ClinSeekAgent
View on GitHub
☆28Jun 1, 2026Updated last month
UCSC-VLAA / VLAA-GUI
View on GitHub
Official implementation of VLAA-GUI series
☆34Jun 20, 2026Updated last month
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
hu-zijing / B2-DiffuRL
View on GitHub
[CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.
☆57Mar 31, 2025Updated last year
UCSC-VLAA / MedVLSynther
View on GitHub
[ICLR'26] MedVLSynther: Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
☆19Nov 1, 2025Updated 8 months ago
MurrayTom / SG-Bench
View on GitHub
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
☆26Nov 29, 2024Updated last year
aijinrjinr / MLB-Seg
View on GitHub
☆14Jul 2, 2024Updated 2 years ago
InvokerStark / OverKill
View on GitHub
☆15Jun 13, 2024Updated 2 years ago
UCSB-AI / SafeKey
View on GitHub
[EMNLP 2025] Official code for the paper "SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning"
☆16May 12, 2026Updated 2 months ago
ybwang119 / label_recovery
View on GitHub
[ICLR 2024] Towards Elminating Hard Label Constraints in Gradient Inverision Attacks
☆14Feb 6, 2024Updated 2 years ago
UCSC-VLAA / Image-Pretraining-for-Video
View on GitHub
[ECCV 2022] This repository includes the official implementation our paper "In Defense of Image Pre-Training for Spatiotemporal Recogniti…
☆19Dec 22, 2022Updated 3 years ago
SORRY-Bench / sorry-bench
View on GitHub
Benchmark evaluation code for "SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal" (ICLR 2025)
☆83Mar 1, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
DualityRL / multi-attempt
View on GitHub
☆19Mar 10, 2025Updated last year
stanford-crfm / air-bench-2024
View on GitHub
AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies
☆30Aug 14, 2024Updated last year
TomSheng21 / R-TPT
View on GitHub
CVPR 2025 - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
☆22Aug 28, 2025Updated 10 months ago
OliverRensu / SDMP
View on GitHub
☆19Jan 2, 2023Updated 3 years ago
microsoft / x-reasoner
View on GitHub
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains
☆49Feb 4, 2026Updated 5 months ago
bairdzhang / des
View on GitHub
☆19Mar 27, 2018Updated 8 years ago
ybwang119 / Awesome-reasoning-safety
View on GitHub
This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆66Sep 5, 2025Updated 10 months ago
NLie2 / what_features_jailbreak_LLMs
View on GitHub
☆18Mar 30, 2025Updated last year
TrustedLLM / UnKE
View on GitHub
☆24Feb 18, 2025Updated last year
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
boschresearch / meta-adversarial-training
View on GitHub
Tensorflow implementation of Meta Adversarial Training for Adversarial Patch Attacks on Tiny ImageNet.
☆26Jan 28, 2021Updated 5 years ago
pasquini-dario / LLM_NeuralExec
View on GitHub
Code to generate NeuralExecs (prompt injection for LLMs)
☆27Oct 5, 2025Updated 9 months ago
domenicrosati / representation-noising
View on GitHub
Code to replicate the Representation Noising paper and tools for evaluating defences against harmful fine-tuning
☆24Dec 12, 2024Updated last year
UCSC-VLAA / o1_medical
View on GitHub
☆48Feb 26, 2025Updated last year
scalable-model-editing / unified-model-editing
View on GitHub
We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.
☆29Dec 16, 2024Updated last year
reds-lab / Meta-Sift
View on GitHub
The official implementation of USENIX Security'23 paper "Meta-Sift" -- Ten minutes or less to find a 1000-size or larger clean subset on …
☆20Apr 27, 2023Updated 3 years ago
poloclub / llm-landscape
View on GitHub
NeurIPS'24 - LLM Safety Landscape
☆40Oct 21, 2025Updated 9 months ago