google-deepmind / long-form-factuality
Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".
☆550Updated this week
Related projects ⓘ
Alternatives and complementary repositories for long-form-factuality
- ☆454Updated this week
- Official repository for ORPO☆421Updated 5 months ago
- Generative Representational Instruction Tuning☆567Updated this week
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality s…☆495Updated 2 weeks ago
- Code for Quiet-STaR☆654Updated 3 months ago
- A library for easily merging multiple LLM experts, and efficiently train the merged LLM.☆408Updated 2 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆799Updated 2 months ago
- Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"☆893Updated 2 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆815Updated this week
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆329Updated 5 months ago
- A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.☆702Updated 2 months ago
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆449Updated 8 months ago
- Forward-Looking Active REtrieval-augmented generation (FLARE)☆589Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.☆627Updated last month
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆523Updated 3 weeks ago
- Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"☆293Updated 11 months ago
- The official evaluation suite and dynamic data release for MixEval.☆224Updated 2 weeks ago
- [NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models☆535Updated 3 weeks ago
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆328Updated this week
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆472Updated 4 months ago
- This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.☆530Updated 8 months ago
- Generate textbook-quality synthetic LLM pretraining data☆488Updated last year
- awesome synthetic (text) datasets☆243Updated 3 weeks ago
- RewardBench: the first evaluation tool for reward models.☆437Updated last month
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆761Updated last month
- ☆523Updated last week
- [ACL 2024] Progressive LLaMA with Block Expansion.☆480Updated 6 months ago
- Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.☆323Updated 9 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆440Updated 8 months ago