kaistAI / FLASK
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
☆214Updated last year
Alternatives and similar repositories for FLASK:
Users that are interested in FLASK are comparing it to the libraries listed below
- [EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning☆228Updated last year
- [ICLR 2024 & NeurIPS 2023 WS] An Evaluator LM that is open-source, offers reproducible evaluation, and inexpensive to use. Specifically d…☆296Updated last year
- Code and model release for the paper "Task-aware Retrieval with Instructions" by Asai et al.☆162Updated last year
- Benchmarking LLMs with Challenging Tasks from Real Users☆215Updated 3 months ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆157Updated 9 months ago
- ☆272Updated last year
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆332Updated last year
- Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering☆160Updated this week
- Simple next-token-prediction for RLHF☆222Updated last year
- Learning to Compress Prompts with Gist Tokens - https://arxiv.org/abs/2304.08467☆274Updated this week
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆250Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆117Updated last year
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆145Updated 2 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆71Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 4 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆113Updated 5 months ago
- ☆116Updated 4 months ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆218Updated last year
- ☆172Updated last year
- LOFT: A 1 Million+ Token Long-Context Benchmark☆172Updated 3 months ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆462Updated 3 weeks ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆125Updated 11 months ago
- Evaluating LLMs with fewer examples☆145Updated 10 months ago
- ☆130Updated last year
- Reverse Instructions to generate instruction tuning data with corpus examples☆208Updated 11 months ago
- Improving Alignment and Robustness with Circuit Breakers☆181Updated 4 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆112Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆150Updated 11 months ago
- [EMNLP 2023] Adapting Language Models to Compress Long Contexts☆293Updated 5 months ago