mwatkins1970 / SAE_Feature_Interpretability_ToolLinks
A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom on Gemma-2B).
☆19Updated 8 months ago
Alternatives and similar repositories for SAE_Feature_Interpretability_Tool
Users that are interested in SAE_Feature_Interpretability_Tool are comparing it to the libraries listed below
Sorting:
- ☆10Updated 2 months ago
- ☆13Updated 6 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 2 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated last month
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆22Updated 7 months ago
- ☆16Updated 3 months ago
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.☆89Updated 11 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- ☆18Updated last month
- XmodelLM☆39Updated 7 months ago
- [NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"☆25Updated 11 months ago
- ☆21Updated 6 months ago
- Fork of Flame repo for training of some new stuff in development☆14Updated last week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- ☆51Updated 7 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Updated 7 months ago
- Lego for GRPO☆28Updated 3 weeks ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆43Updated last year
- Code and data releases for the paper -- DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory☆44Updated 4 months ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆18Updated last month
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆59Updated last year
- Official implementation of ECCV24 paper: POA☆24Updated 10 months ago
- ☆20Updated 3 months ago
- alternative way to calculating self attention☆18Updated last year
- Official repo for EMNLP 2023 paper "Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations…☆29Updated last year
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated 11 months ago
- Problem-Oriented Segmentation and Retrieval EMNLP 2024 Findings☆33Updated 7 months ago
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆36Updated 2 months ago
- ☆20Updated last year