jongjyh / TrFr
Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning
☆42Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for TrFr
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- ☆87Updated 9 months ago
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆64Updated last month
- ☆112Updated last month
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆82Updated 2 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 9 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆130Updated this week
- ☆44Updated 6 months ago
- Just a bunch of benchmark logs for different LLMs☆116Updated 3 months ago
- ☆126Updated 7 months ago
- ☆48Updated last year
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆65Updated 4 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 8 months ago
- ☆68Updated 3 months ago
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆104Updated last month
- A set of scripts and notebooks on LLM finetunning and dataset creation☆93Updated last month
- ☆133Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year
- ☆41Updated 2 weeks ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆93Updated 5 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆96Updated last month
- RAFT, or Retrieval-Augmented Fine-Tuning, is a method comprising of a fine-tuning and a RAG-based retrieval phase. It is particularly sui…☆75Updated 2 months ago
- Set of scripts to finetune LLMs☆36Updated 7 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆115Updated 2 weeks ago