bowen-upenn / llm_token_bias
[EMNLP 2024] A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners
☆18Updated 3 months ago
Alternatives and similar repositories for llm_token_bias:
Users that are interested in llm_token_bias are comparing it to the libraries listed below
- Evaluate the Quality of Critique☆35Updated 9 months ago
- AbstainQA, ACL 2024☆25Updated 5 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 10 months ago
- ☆12Updated last year
- ☆34Updated 11 months ago
- Tasks for describing differences between text distributions.☆16Updated 7 months ago
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆26Updated 2 years ago
- ☆22Updated 2 months ago
- ☆20Updated 8 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆15Updated last week
- [ICLR 2024] Unveiling the Pitfalls of Knowledge Editing for Large Language Models☆22Updated 9 months ago
- Restore safety in fine-tuned language models through task arithmetic☆27Updated 11 months ago
- Models, data, and codes for the paper: MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models☆18Updated 5 months ago
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆35Updated 3 months ago
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆66Updated 2 years ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆34Updated 3 months ago
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆54Updated 10 months ago
- ☆41Updated last year
- Personality Alignment of Language Models☆25Updated this week
- Public code repo for paper "Aligning LLMs with Individual Preferences via Interaction"☆23Updated 5 months ago
- This repository includes code for the paper "Does Localization Inform Editing? Surprising Differences in Where Knowledge Is Stored vs. Ca…☆59Updated last year
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆23Updated 3 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆15Updated 2 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆54Updated 8 months ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆21Updated 2 years ago
- ☆24Updated 4 months ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆11Updated 4 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆32Updated last month
- Official repo for SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency☆35Updated last month
- ✨ Resolving Knowledge Conflicts in Large Language Models, COLM 2024☆15Updated 5 months ago