stunningpixels / lou-eval
Track the progress of LLM context utilisation
☆53Updated 6 months ago
Alternatives and similar repositories for lou-eval:
Users that are interested in lou-eval are comparing it to the libraries listed below
- ☆48Updated last year
- Just a bunch of benchmark logs for different LLMs☆116Updated 5 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 9 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated last month
- Evaluating LLMs with CommonGen-Lite☆87Updated 9 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆70Updated last year
- ☆46Updated 2 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆46Updated 9 months ago
- Score LLM pretraining data with classifiers☆55Updated last year
- Simple Graph Memory for AI applications☆81Updated 5 months ago
- ☆22Updated last year
- LLM reads a paper and produce a working prototype☆46Updated 2 weeks ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆68Updated 3 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆62Updated last month
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 6 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆79Updated 10 months ago
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.☆88Updated 5 months ago
- RAFT, or Retrieval-Augmented Fine-Tuning, is a method comprising of a fine-tuning and a RAG-based retrieval phase. It is particularly sui…☆83Updated 4 months ago
- Synthetic Data for LLM Fine-Tuning☆107Updated last year
- ☆24Updated last year
- ☆57Updated last year
- Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…☆58Updated 6 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆99Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆82Updated last year
- ☆38Updated last year
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆20Updated last year
- GPT-4 Level Conversational QA Trained In a Few Hours☆58Updated 4 months ago