zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆138Updated last month
Alternatives and similar repositories for KnowledgeCircuits:
Users that are interested in KnowledgeCircuits are comparing it to the libraries listed below
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆104Updated 6 months ago
- ☆148Updated 3 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆107Updated last year
- Function Vectors in Large Language Models (ICLR 2024)☆156Updated 3 weeks ago
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆169Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆112Updated 3 weeks ago
- ☆164Updated last month
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆53Updated 4 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 2 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆48Updated 4 months ago
- ☆105Updated 2 months ago
- ☆91Updated last month
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆95Updated 6 months ago
- ☆151Updated 2 weeks ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).☆56Updated last year
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆107Updated this week
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 7 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆84Updated 3 weeks ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆221Updated 5 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆175Updated this week
- Repo of paper "Free Process Rewards without Process Labels"☆140Updated last month
- The Paper List on Data Contamination for Large Language Models Evaluation.☆91Updated 2 weeks ago
- LoFiT: Localized Fine-tuning on LLM Representations☆35Updated 2 months ago
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆66Updated last month
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆96Updated 8 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated last month
- PASTA: Post-hoc Attention Steering for LLMs☆113Updated 4 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆186Updated 4 months ago