okhat / blog
☆229Updated last month
Related projects ⓘ
Alternatives and complementary repositories for blog
- A bibliography and survey of the papers surrounding o1☆577Updated this week
- This repository collects all relevant resources about interpretability in LLMs☆282Updated last week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆160Updated last month
- Representation Engineering: A Top-Down Approach to AI Transparency☆719Updated 2 months ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆166Updated 3 weeks ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆233Updated 6 months ago
- ☆320Updated 3 months ago
- RewardBench: the first evaluation tool for reward models.☆424Updated 2 weeks ago
- RuLES: a benchmark for evaluating rule-following in language models☆210Updated last month
- System 2 Reasoning Link Collection☆683Updated last week
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆202Updated this week
- A simple unified framework for evaluating LLMs☆138Updated this week
- Training Sparse Autoencoders on Language Models☆449Updated this week
- open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality☆156Updated 3 months ago
- GPT4 based personalized ArXiv paper assistant bot☆486Updated 7 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆350Updated 3 months ago
- The official evaluation suite and dynamic data release for MixEval.☆222Updated last week
- Sparse autoencoders☆333Updated 2 weeks ago
- A Survey on Data Selection for Language Models☆178Updated 3 weeks ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆157Updated last month
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆428Updated 6 months ago
- Evaluating LLMs with fewer examples☆133Updated 6 months ago
- List of papers on hallucination detection in LLMs.☆669Updated last week
- Extracting spatial and temporal world models from LLMs☆243Updated last year
- ☆149Updated 6 months ago
- [ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning☆332Updated 2 months ago
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆137Updated last month
- Using sparse coding to find distributed representations used by neural networks.☆181Updated 11 months ago
- Function Vectors in Large Language Models (ICLR 2024)☆116Updated 3 weeks ago