ruiqi-zhong / nlparamLinks

Augmenting Statistical Models with Natural Language Parameters

☆27

Alternatives and similar repositories for nlparam

Users that are interested in nlparam are comparing it to the libraries listed below

Sorting:

dannyallover / overthinking_the_truth
☆29Updated last year
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆62Updated 8 months ago
roeehendel / icl_task_vectors
☆96Updated last year
balevinstein / Probes
☆52Updated 2 years ago
deeplearning-wisc / args
☆43Updated last year
ajyl / dpo_toxic
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.
☆74Updated 4 months ago
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆94Updated 3 years ago
MaheepChaudhary / SAE-Ravel
Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…
☆12Updated 6 months ago
tatsu-lab / linguistic_calibration
Align your LM to express calibrated verbal statements of confidence in its long-form generations.
☆27Updated last year
milesaturpin / cot-unfaithfulness
☆47Updated last year
aviclu / ffn-values
☆62Updated 2 years ago
epfl-dlab / llm-latent-language
Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".
☆78Updated last year
Re-Align / AlignTDS
Analyzing LLM Alignment via Token distribution shift
☆16Updated last year
IBM / activation-steering
[ICLR 2025] General-purpose activation steering library
☆87Updated last week
explanare / ravel
Evaluate interpretability methods on localizing and disentangling concepts in LLMs.
☆52Updated 9 months ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆59Updated last year
Nanami18 / Snowballed_Hallucination
☆45Updated 11 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆114Updated 10 months ago
fc2869 / lo-fit
LoFiT: Localized Fine-tuning on LLM Representations
☆39Updated 6 months ago
y0mingzhang / diffuse-distributions
Forcing Diffuse Distributions out of Language Models
☆17Updated 10 months ago
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆68Updated 2 years ago
SumilerGAO / SunGen
☆27Updated 2 years ago
shadowkiller33 / Contrast-Instruction
☆19Updated last year
tml-epfl / long-is-more-for-alignment
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]
☆18Updated last year
allenai / noncompliance
This repository contains data, code and models for contextual noncompliance.
☆23Updated last year
GXimingLu / Quark
☆75Updated last year
Dakingrai / neuron-analysis-cot-arithmetic-reasoning
☆11Updated 5 months ago
google-research-datasets / GSM-IC
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…
☆60Updated 2 years ago
lifan-yuan / OOD_NLP
[NeurIPS 2023 D&B Track] Code and data for paper "Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evalua…
☆34Updated 2 years ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆38Updated 8 months ago