Liyan06 / MiniCheck
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
☆55Updated this week
Related projects: ⓘ
- Official code for "MAmmoTH2: Scaling Instructions from the Web"☆106Updated this week
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆130Updated 2 months ago
- Official implementation for the paper "LongEmbed: Extending Embedding Models for Long Context Retrieval"☆108Updated 4 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆118Updated 6 months ago
- The code for the paper: "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models"☆45Updated 2 months ago
- Code and Data for Tau-Bench☆91Updated this week
- Attribute (or cite) statements generated by LLMs back to in-context information.☆107Updated 2 weeks ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆127Updated 2 weeks ago
- ☆52Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆182Updated last month
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆105Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆52Updated last month
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆81Updated 2 weeks ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆72Updated 8 months ago
- ☆105Updated this week
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated this week
- Function Vectors in Large Language Models (ICLR 2024)☆107Updated last month
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆77Updated last month
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆87Updated 2 months ago
- [NAACL 2024 Outstanding Paper] Source code for the NAACL 2024 paper entitled "R-Tuning: Instructing Large Language Models to Say 'I Don't…☆82Updated 2 months ago
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆41Updated last month
- ☆118Updated 5 months ago
- PASTA: Post-hoc Attention Steering for LLMs☆96Updated last week
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆45Updated 6 months ago
- Benchmarking library for RAG☆87Updated this week
- A simple unified framework for evaluating LLMs☆121Updated this week
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆109Updated last week
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆134Updated 6 months ago
- Small and Efficient Mathematical Reasoning LLMs☆69Updated 7 months ago