keven980716 / weak-to-strong-deceptionView external linksLinks
[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"
☆14Jun 21, 2024Updated last year
Alternatives and similar repositories for weak-to-strong-deception
Users that are interested in weak-to-strong-deception are comparing it to the libraries listed below
Sorting:
- ☆16Mar 22, 2025Updated 10 months ago
- Applies ROME and MEMIT on Mamba-S4 models☆14Apr 5, 2024Updated last year
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated last year
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆18Sep 7, 2024Updated last year
- Repository for the EMNLP 2023 Demo Paper "Reaction Miner: An Integrated System for Chemical Reaction Extraction from Textual Data"☆19Jan 27, 2025Updated last year
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆48Jan 17, 2024Updated 2 years ago
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Code for the paper "Be Careful about Poisoned Word Embeddings: Exploring the Vulnerability of the Embedding Layers in NLP Models" (NAACL-…☆44Jul 26, 2021Updated 4 years ago
- ☆46Jun 24, 2025Updated 7 months ago
- ☆23Jul 5, 2024Updated last year
- The rule-based evaluation subset and code implementation of Omni-MATH☆26Dec 23, 2024Updated last year
- Code for the paper "RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models" (EMNLP 2021)☆25Oct 21, 2021Updated 4 years ago
- Code for the paper "Rethinking Stealthiness of Backdoor Attack against NLP Models" (ACL-IJCNLP 2021)☆24Dec 9, 2021Updated 4 years ago
- The official GitHub page for paper "NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional St…☆25May 10, 2024Updated last year
- Unofficial implementation of "Backdooring Instruction-Tuned Large Language Models with Virtual Prompt Injection"☆27Jul 6, 2024Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆184May 20, 2025Updated 8 months ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆41Jun 24, 2025Updated 7 months ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- ☆30Jun 20, 2024Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)☆51May 12, 2025Updated 9 months ago
- ☆37Sep 26, 2024Updated last year
- Methods and evaluation for aligning language models temporally☆30Mar 2, 2024Updated last year
- [TPAMI Major Revision] Resource Summary for paper "Unveiling the Unseen: A Comprehensive Survey on Explainable Anomaly Detection in Image…☆35Apr 27, 2025Updated 9 months ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆75May 20, 2025Updated 8 months ago
- 中国科学院大学太极实验室2025年度“大学生创新实践训练计划”☆13Apr 1, 2025Updated 10 months ago
- ☆11Dec 23, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- Concurrency library☆16Oct 13, 2024Updated last year
- ☆51Oct 23, 2023Updated 2 years ago
- Teaching Models to Express Their Uncertainty in Words☆39May 26, 2022Updated 3 years ago
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- Repositorio de la unidad 2 del curso INFO274: Simulación, Instituto de Informática, UACh☆10Dec 18, 2022Updated 3 years ago
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated 11 months ago
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- ☆10Apr 7, 2024Updated last year
- Single-Source Domain Generalization for Bearing Fault Diagnosis Using Feature-Augmented Adaptive Neuro-Fuzzy Inference System☆11Apr 13, 2024Updated last year
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 8 months ago