stunningpixels / lou-eval
Track the progress of LLM context utilisation
☆53Updated 8 months ago
Alternatives and similar repositories for lou-eval:
Users that are interested in lou-eval are comparing it to the libraries listed below
- ☆48Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 7 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- ☆48Updated 4 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆102Updated 3 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated 11 months ago
- Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models☆69Updated last year
- ☆22Updated last year
- Camel-Coder: Collaborative task completion with multiple agents. Role-based prompts, intervention mechanism, and thoughtful suggestions☆33Updated last year
- LLM reads a paper and produce a working prototype☆51Updated last week
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Updated last year
- Writing Blog Posts with Generative Feedback Loops!☆47Updated last year
- ☆51Updated 8 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- ☆20Updated last year
- ☆84Updated last year
- ☆37Updated last year
- ☆73Updated last year
- Score LLM pretraining data with classifiers☆54Updated last year
- Automated testing and benchmarking for code generation agents.☆18Updated last year
- Using multiple LLMs for ensemble Forecasting☆16Updated last year
- Advanced Reasoning Benchmark Dataset for LLMs☆45Updated last year
- Public Inflection Benchmarks☆68Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆25Updated 3 months ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆75Updated 5 months ago
- A framework for evaluating function calls made by LLMs☆37Updated 8 months ago
- ☆60Updated last year