Re-Align/AlignTDS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Re-Align/AlignTDS)

Re-Align / AlignTDS

Analyzing LLM Alignment via Token distribution shift

☆17

Alternatives and similar repositories for AlignTDS

Users that are interested in AlignTDS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

emorynlp / seq2seq-corenlp
View on GitHub
☆13Feb 7, 2023Updated 3 years ago
Re-Align / URIAL
View on GitHub
☆316Jun 9, 2024Updated 2 years ago
neale / avoiding-side-effects
View on GitHub
Code for reproducing the results from the paper Avoiding Side Effects in Complex Environments
☆12Jun 3, 2021Updated 5 years ago
Haner-LiveInLove / cs285_homework_fall2023
View on GitHub
My solution to assignments for Berkeley CS 285: Deep Reinforcement Learning, Decision Making, and Control.
☆16Mar 19, 2025Updated last year
allenai / easy-to-hard-generalization
View on GitHub
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Jan 17, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
INK-USC / Reflect
View on GitHub
Data and Code for Paper "Reflect Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality" (EMNLP 2022)
☆11Nov 28, 2022Updated 3 years ago
INK-USC / XCSR
View on GitHub
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"
☆23Oct 26, 2021Updated 4 years ago
ssmisya / PolicyShiftGuard
View on GitHub
PolicyShiftGuard: Benchmarking and Improving Policy-Adaptive Image Guardrails
☆21Jul 8, 2026Updated last week
HITsz-TMG / ICL-State-Vector
View on GitHub
☆12Jul 4, 2024Updated 2 years ago
qizhangli / Gradient-based-Jailbreak-Attacks
View on GitHub
Code for our NeurIPS 2024 paper Improved Generation of Adversarial Examples Against Safety-aligned LLMs
☆12Nov 7, 2024Updated last year
RUCBM / ICLEval
View on GitHub
☆14Jun 24, 2024Updated 2 years ago
declare-lab / safety-arithmetic
View on GitHub
☆13Jan 14, 2025Updated last year
vipulgupta1011 / CALM
View on GitHub
☆11Oct 2, 2023Updated 2 years ago
DanielSc4 / Dynamic-Activation-Composition
View on GitHub
Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"
☆14Nov 22, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
LINs-lab / ELICIT
View on GitHub
[ICLR 2025] ELICIT: LLM Augmentation Via External In-context Capability
☆14Mar 11, 2025Updated last year
zyxnlp / ICL-Interpretation-Analysis-Resources
View on GitHub
Links to publications that focus on the interpretation and analysis of in-context learning
☆14Oct 17, 2024Updated last year
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
zhaoyanpeng / vpcfg
View on GitHub
Visually Grounded PCFG Induction
☆38May 18, 2022Updated 4 years ago
shadowkiller33 / Language_attack
View on GitHub
A repo for LLM jailbreak
☆14Sep 5, 2023Updated 2 years ago
paul-rottger / xstest
View on GitHub
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆138Feb 24, 2025Updated last year
rain152 / PAT
View on GitHub
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
☆11Oct 29, 2024Updated last year
txsun1997 / Metric-Fairness
View on GitHub
EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation
☆41Oct 19, 2022Updated 3 years ago
jungmaier / dirichlet-smoothed-word-embeddings
View on GitHub
Word embeddings from PPMI-weighted and dirichlet-smoothed co-occurrence matrices
☆10Aug 3, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
msakarvadia / AttentionLens
View on GitHub
Interpretating the latent space representations of attention head outputs for LLMs
☆39Aug 13, 2024Updated last year
M3-IT / YING-VLM
View on GitHub
Vision Large Language Models trained on M3IT instruction tuning dataset
☆17Aug 16, 2023Updated 2 years ago
Arthurma71 / AdvDrop
View on GitHub
☆11Mar 8, 2024Updated 2 years ago
Anikethh / ResearchGym
View on GitHub
Benchmark and execution environment for evaluating LLM agents on end-to-end AI Research. [ICLR 2026]
☆35May 31, 2026Updated last month
neulab / ToM-Language-Acquisition
View on GitHub
Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".
☆15Apr 27, 2023Updated 3 years ago
wondergo2017 / sild
View on GitHub
Implementation codes for NeurIPS23 paper "Spectral Invariant Learning for Dynamic Graphs under Distribution Shifts"
☆14Mar 19, 2024Updated 2 years ago
fc2869 / lo-fit
View on GitHub
LoFiT: Localized Fine-tuning on LLM Representations
☆45Jan 15, 2025Updated last year
joonkeekim / Instructive-Decoding
View on GitHub
Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…
☆21Mar 7, 2024Updated 2 years ago
BKHMSI / cultural-trends
View on GitHub
Investigating Cultural Alignment of Large Language Models
☆13Aug 14, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Spico197 / MoE-SFT
View on GitHub
🍼 Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
☆41Sep 29, 2024Updated last year
jenni-ai / T2FW
View on GitHub
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆20Oct 9, 2022Updated 3 years ago
EricJin2002 / UCAS-NLP-2023
View on GitHub
UCAS大三自然语言处理课程大作业
☆12Jun 25, 2023Updated 3 years ago
CPF-NLPR / ULGN4DocEFI
View on GitHub
☆10Nov 14, 2021Updated 4 years ago
automl / is_mamba_capable_of_icl
View on GitHub
☆18Apr 24, 2024Updated 2 years ago
LINs-lab / LIE
View on GitHub
[preprint] Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning
☆19Feb 18, 2026Updated 5 months ago
Helw150 / levanter
View on GitHub
Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax
☆16Jun 16, 2024Updated 2 years ago