target-benchmark / target
TARGET is a benchmark for evaluating Table Retrieval for Generative Tasks such as Fact Verification and Text-to-SQL
☆12Updated this week
Related projects ⓘ
Alternatives and complementary repositories for target
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- ☆41Updated 2 weeks ago
- A RAG that can scale 🧑🏻💻☆11Updated 5 months ago
- Data preparation code for CrystalCoder 7B LLM☆42Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆77Updated 8 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆68Updated last month
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆23Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- Simple examples using Argilla tools to build AI☆42Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆31Updated last week
- ☆25Updated 2 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆61Updated 4 months ago
- ☆66Updated 2 months ago
- ☆43Updated 4 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆47Updated 10 months ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆37Updated 5 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆21Updated last week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆53Updated 3 weeks ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Updated 8 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆47Updated this week
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆20Updated 9 months ago
- code for training & evaluating Contextual Document Embedding models☆119Updated this week
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆129Updated this week
- GPT-4 Level Conversational QA Trained In a Few Hours☆55Updated 3 months ago
- Measuring RAG solutions throughput and latency☆13Updated 3 months ago
- ☆74Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆29Updated 6 months ago