zzwjames / FailureLLMUnlearning
☆22Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for FailureLLMUnlearning
- Codebase for Instruction Following without Instruction Tuning☆32Updated last month
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 8 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- ☆15Updated 3 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆29Updated last month
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆25Updated 5 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆17Updated 2 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆39Updated last month
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆30Updated 6 months ago
- ☆26Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales☆54Updated last week
- ☆12Updated 2 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆33Updated last month
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated 2 weeks ago
- Official Repository for Dataset Inference for LLMs☆23Updated 3 months ago
- Code for https://arxiv.org/abs/2401.17139 (NeurIPS 2024)☆25Updated this week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆39Updated 3 months ago
- ☆38Updated last year
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Updated 5 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆22Updated last month
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- The repository contains code for Adaptive Data Optimization☆18Updated last month
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 10 months ago
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆12Updated 2 months ago
- ☆15Updated 4 months ago
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆39Updated 7 months ago