stunningpixels / lou-eval
Track the progress of LLM context utilisation
☆53Updated 7 months ago
Alternatives and similar repositories for lou-eval:
Users that are interested in lou-eval are comparing it to the libraries listed below
- ☆48Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 6 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated 2 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆100Updated 10 months ago
- A framework for evaluating function calls made by LLMs☆36Updated 6 months ago
- ☆24Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆47Updated 11 months ago
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Evaluating LLMs with CommonGen-Lite☆88Updated 11 months ago
- ☆48Updated 3 months ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Updated last year
- ☆57Updated last year
- Camel-Coder: Collaborative task completion with multiple agents. Role-based prompts, intervention mechanism, and thoughtful suggestions☆33Updated last year
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆89Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- LLM reads a paper and produce a working prototype☆48Updated 2 weeks ago
- ☆22Updated last year
- 🔓 The open-source autonomous agent LLM initiative 🔓☆91Updated last year
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆101Updated last year
- Chat Markup Language conversation library☆55Updated last year
- ☆87Updated last year
- Score LLM pretraining data with classifiers☆54Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆24Updated 2 months ago
- ☆75Updated last year
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- Automated testing and benchmarking for code generation agents.☆18Updated last year
- entropix style sampling + GUI☆25Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆82Updated 4 months ago
- Testing speed and accuracy of RAG with, and without Cross Encoder Reranker.☆48Updated last year