State of What Art? A Call for Multi-Prompt LLM Evaluation
☆15Jul 10, 2024Updated last year
Alternatives and similar repositories for Multi-Prompt-LLM-Evaluation
Users that are interested in Multi-Prompt-LLM-Evaluation are comparing it to the libraries listed below
Sorting:
- Dataset and Evaluation Code for the K-QA Benchmark.☆18May 26, 2024Updated last year
- ☆20Apr 23, 2024Updated last year
- ☆19Mar 12, 2025Updated 11 months ago
- Updating collection of summarization datasets in 100+ languages, based on our paper "The State and Fate of Summarization Datasets: A Surv…☆30Apr 29, 2025Updated 10 months ago
- Momentum Decoding: Open-ended Text Generation as Graph Exploration☆19Jan 27, 2023Updated 3 years ago
- Code for IterInpaint model, presented in Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation (CVPR 2024 work…☆25Jul 21, 2024Updated last year
- VaLM: Visually-augmented Language Modeling. ICLR 2023.☆56Mar 6, 2023Updated 2 years ago
- Repository collecting resources and best practices to improve experimental rigour in deep learning research.☆27Mar 30, 2023Updated 2 years ago
- EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning (ACL 2023)☆33Jul 18, 2023Updated 2 years ago
- Self-hosted GPT-4V api☆27Nov 6, 2023Updated 2 years ago
- code for "Natural Language to Code Translation with Execution"☆41Nov 2, 2022Updated 3 years ago
- EMNLP 2021 - Frustratingly Simple Pretraining Alternatives to Masked Language Modeling☆34Nov 21, 2021Updated 4 years ago
- [CVPR 2020] A generative model with latent factors that are independent and localized.☆12Mar 27, 2025Updated 11 months ago
- A simple repository showcasing a few LLM Evaluation strategies and leverages W&B Sweeps to optimize the LLM system.☆12Jul 11, 2023Updated 2 years ago
- WSDM 2021 Tutorial on Advances in Bias-aware Recommendation on the Web☆11Mar 8, 2021Updated 4 years ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆21Jan 6, 2026Updated last month
- ☆12Feb 22, 2021Updated 5 years ago
- ☆12Dec 14, 2022Updated 3 years ago
- PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions (NeurIPS 2025 D&B track, Spotlight)☆23Feb 11, 2026Updated 2 weeks ago
- Detect-Then-Explain Framework for Text-to-SQL task☆10Dec 6, 2023Updated 2 years ago
- [ACL 2023] Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generati…☆10Sep 23, 2023Updated 2 years ago
- ☆11May 18, 2022Updated 3 years ago
- ☆11Jul 20, 2021Updated 4 years ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆19Jun 2, 2025Updated 9 months ago
- ☆11Jun 18, 2023Updated 2 years ago
- ☆10Nov 7, 2022Updated 3 years ago
- Bachelor's grad work on code autocompletion with rnn☆10May 19, 2019Updated 6 years ago
- Literary Language Toolkit: code, models, corpora, and web tools☆11Mar 28, 2024Updated last year
- Implementation of the spotlight: a method for discovering systematic errors in deep learning models☆11Oct 5, 2021Updated 4 years ago
- Word embeddings trained on medical subreddits.☆10Jan 4, 2021Updated 5 years ago
- Official repository for Fourier model that can generate periodic signals☆10Mar 10, 2022Updated 3 years ago
- ☆13May 10, 2025Updated 9 months ago
- [ACL 2023] Code and data for our paper "Measuring Progress in Fine-grained Vision-and-Language Understanding"☆13Jun 11, 2023Updated 2 years ago
- Implementation of "Face detection in untrained deep neural networks" (Baek et al., Nature Communications, 2021)☆10Nov 2, 2021Updated 4 years ago
- blog☆18Jul 18, 2023Updated 2 years ago
- PyTorch Lightning based framework to run experiments for self-supervised learning tasks.☆10Feb 14, 2020Updated 6 years ago
- Open-Retrieval Conversational Machine Reading: A new setting & OR-ShARC dataset☆13Nov 19, 2022Updated 3 years ago
- Repo for the BBCAVS10k distribution☆10Nov 27, 2024Updated last year
- Code for the AACL 2022 Paper "This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Cli…☆12Nov 18, 2022Updated 3 years ago