IINemo / lm-polygraphLinks
β429Updated this week
Alternatives and similar repositories for lm-polygraph
Users that are interested in lm-polygraph are comparing it to the libraries listed below
Sorting:
- Interpretability for sequence generation models π πβ454Updated 3 weeks ago
- This repository collects all relevant resources about interpretability in LLMsβ389Updated last year
- How do transformer LMs encode relations?β55Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingfaceβ138Updated 11 months ago
- Steering Llama 2 with Contrastive Activation Additionβ206Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steeringβ196Updated 11 months ago
- [ICLR 2025] General-purpose activation steering libraryβ137Updated 4 months ago
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".β333Updated 7 months ago
- β244Updated last year
- β58Updated 2 years ago
- Using sparse coding to find distributed representations used by neural networks.β293Updated 2 years ago
- Performant framework for training, analyzing and visualizing Sparse Autoencoders (SAEs) and their frontier variants.β175Updated this week
- List of papers on hallucination detection in LLMs.β1,023Updated 2 weeks ago
- Stanford NLP Python library for understanding and improving PyTorch models via interventionsβ854Updated this week
- β142Updated last month
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spacesβ100Updated 2 years ago
- Sparse probing paper full code.β66Updated 2 years ago
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paperβ¦β128Updated last year
- Unified access to Large Language Model modules using NNsightβ81Updated 2 weeks ago
- β195Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methodsβ162Updated 7 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"β218Updated last year
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).β403Updated last year
- A resource repository for representation engineering in large language modelsβ148Updated last year
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ597Updated last year
- β116Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicβ¦β413Updated 9 months ago
- AI Logging for Interpretability and Explainabilityπ¬β138Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasetsβ226Updated last year
- Repository for the Bias Benchmark for QA dataset.β135Updated 2 years ago