kimyuji / EvolvingQA_benchmark
Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)
☆10Updated 3 months ago
Alternatives and similar repositories for EvolvingQA_benchmark:
Users that are interested in EvolvingQA_benchmark are comparing it to the libraries listed below
- Official repository for "Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity (COLING2022)"☆18Updated 2 years ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆92Updated 2 years ago
- ☆23Updated last year
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆69Updated 8 months ago
- ☆10Updated 4 months ago
- AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain☆23Updated 2 years ago
- ☆20Updated last year
- ACL 2023 short: Balancing Lexical and Semantic Quality in Abstractive Summarization☆15Updated last year
- [ICLR 2025] ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains☆12Updated last week
- KAIST AI605 Deep Learning for NLP☆31Updated 2 years ago
- Official Code Repository for the paper "Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-intensive Tasks…☆37Updated 2 months ago
- ☆19Updated 2 years ago
- ☆27Updated last year
- ☆15Updated 2 years ago
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table☆22Updated last year
- [ACL 2024] FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model☆13Updated 5 months ago
- Findings of ACL'2023: Optimizing Test-Time Query Representations for Dense Retrieval☆29Updated last year
- This is the oficial repository for "Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts" (EMNLP 2022)☆100Updated 2 years ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)☆14Updated last year
- BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages☆23Updated last month
- CareCall for Seniors: Role Specified Open-Domain Dialogue dataset generated by leveraging LLMs (NAACL 2022).☆60Updated 2 years ago
- [NeurIPS 2022 Workshop] A Case Study with Negated Prompts using T0 (3B, 11B), InstructGPT (350M-175B), GPT-3 (350M - 175B) & OPT (125M - …☆24Updated 2 years ago
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Updated last year
- Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…☆11Updated 7 months ago
- Unofficial re-implementation of "Trusting Your Evidence: Hallucinate Less with Context-aware Decoding"☆28Updated 2 months ago
- KLUE Benchmark 1st place (2021.12) solutions. (RE, MRC, NLI, STS, TC)☆25Updated 2 years ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆79Updated 4 months ago
- [Findings of NAACL2022] A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation☆28Updated 2 years ago
- [TACL 2024] Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis☆10Updated 2 months ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆21Updated last month