logikon-ai / cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
☆16Updated 3 months ago
Alternatives and similar repositories for cot-eval
Users that are interested in cot-eval are comparing it to the libraries listed below
Sorting:
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- ☆15Updated last month
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆18Updated last month
- ☆25Updated 7 months ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- Verifiers for LLM Reinforcement Learning☆50Updated last month
- Minimum Description Length probing for neural network representations☆19Updated 3 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated 3 weeks ago
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 5 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆18Updated this week
- Reasoning by Communicating with Agents☆28Updated 2 weeks ago
- ☆27Updated 2 weeks ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆11Updated last month
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆34Updated last year
- Aioli: A unified optimization framework for language model data mixing☆25Updated 3 months ago
- Efficient and Scalable Estimation of Tool Representations in Vector Space☆23Updated 8 months ago
- ☆19Updated last month
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 2 years ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆25Updated 5 months ago
- ☆20Updated 2 months ago
- ☆25Updated last year
- Code repo for MathAgent☆16Updated last year
- ☆15Updated 4 months ago
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated 3 weeks ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- ☆30Updated 9 months ago
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- ☆27Updated 10 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago