tianyu139 / meaning-as-trajectories
Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)
☆20Updated 9 months ago
Alternatives and similar repositories for meaning-as-trajectories:
Users that are interested in meaning-as-trajectories are comparing it to the libraries listed below
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆33Updated 4 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 9 months ago
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- ☆28Updated last year
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆66Updated 8 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆22Updated last year
- ☆53Updated 2 years ago
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago
- ☆41Updated 2 years ago
- ☆88Updated last month
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆16Updated 2 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆70Updated 3 months ago
- ☆12Updated last year
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆22Updated 6 months ago
- Efficient Scaling laws and collaborative pretraining.☆15Updated last month
- ☆29Updated 2 months ago
- ☆31Updated 5 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆14Updated 6 months ago
- ☆38Updated last year
- A mechanistic approach for understanding and detecting factual errors of large language models.☆41Updated 8 months ago
- ☆33Updated last year
- ☆13Updated last month
- ☆20Updated 5 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆29Updated last month
- Personal implementation of ASIF by Antonio Norelli☆25Updated 9 months ago
- Tasks for describing differences between text distributions.☆16Updated 7 months ago
- ☆17Updated 8 months ago