locuslab / llm-idiosyncrasies
Code release for "Idiosyncrasies in Large Language Models"
☆23Updated last month
Alternatives and similar repositories for llm-idiosyncrasies:
Users that are interested in llm-idiosyncrasies are comparing it to the libraries listed below
- ☆35Updated 2 years ago
- ☆12Updated 2 years ago
- ☆34Updated last year
- chrome extension for renaming tabs showing paper-pdfs from common providers☆91Updated 2 months ago
- 📰 Computing the information content of trained neural networks☆21Updated 3 years ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆50Updated 3 weeks ago
- Fine tuning Mistral-7b with PEFT(Parameter Efficient Fine-Tuning) and LoRA(Low-Rank Adaptation) on Puffin Dataset(multi-turn conversation…☆13Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆62Updated this week
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 8 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 7 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆26Updated last month
- ☆50Updated 5 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆45Updated last year
- research work on multimodal cognitive ai☆60Updated last month
- ☆16Updated 3 weeks ago
- Training hybrid models for dummies.☆20Updated 2 months ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [ICLR 2025]☆66Updated 2 months ago
- https://footprints.baulab.info☆17Updated 5 months ago
- Official implementation of MAIA, A Multimodal Automated Interpretability Agent☆76Updated 3 weeks ago
- Erasing conceptual knowledge from language models through low-rank fine-tuning☆12Updated this week
- Functional Benchmarks and the Reasoning Gap☆84Updated 6 months ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆22Updated 6 months ago
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging☆19Updated last month
- One Line To Build Zero-Data Classifiers in Minutes☆36Updated 6 months ago
- ☆15Updated last year
- ☆117Updated 7 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆75Updated 5 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆43Updated 7 months ago
- ☆59Updated 2 weeks ago
- ☆17Updated last week