allenai / hci-alt-texts
Dataset and annotations for ASSETS 2022 publication
☆12Updated 2 years ago
Alternatives and similar repositories for hci-alt-texts
Users that are interested in hci-alt-texts are comparing it to the libraries listed below
Sorting:
- ☆11Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 5 months ago
- ☆19Updated last month
- ☆20Updated 2 months ago
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆23Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 7 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated last month
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆84Updated 5 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆40Updated 2 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆128Updated last year
- ☆45Updated 9 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆44Updated 10 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆51Updated 2 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆40Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models☆21Updated 5 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆55Updated 2 months ago
- Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)☆36Updated 4 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆53Updated 5 months ago
- Code/data for MARG (multi-agent review generation)☆43Updated 6 months ago
- [arXiv preprint] Official Repository for "Evaluating Language Models as Synthetic Data Generators"☆33Updated 5 months ago
- Retrieval-Augmented Generation battle!☆50Updated 5 months ago
- ☆57Updated 7 months ago
- This repository contains ScholarQABench data and evaluation pipeline.☆71Updated last month
- A set of utilities for running few-shot prompting experiments on large-language models☆120Updated last year
- Code and Data for "Language Modeling with Editable External Knowledge"☆32Updated 10 months ago
- The corresponding code for our paper: "Exploring the Challenges of Open Domain Multi-Document Summarization". Do not hesitate to open an …☆32Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆105Updated 7 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆100Updated last year