logikon-ai / cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
☆16Updated last month
Alternatives and similar repositories for cot-eval:
Users that are interested in cot-eval are comparing it to the libraries listed below
- Minimum Description Length probing for neural network representations☆19Updated 2 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆16Updated this week
- ☆19Updated 4 months ago
- ☆25Updated last year
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 2 years ago
- ☆13Updated last month
- Code, results and other artifacts from the paper introducing the WildChat-50m dataset and the Re-Wild model family.☆29Updated 2 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆10Updated last month
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆15Updated last year
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated last year
- Code for "Attention in Large Language Models Yeilds Efficient Zero-Shot Re-Rankers"☆16Updated this week
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.☆16Updated 5 months ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆20Updated last year
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- ☆32Updated 9 months ago
- efficient query encoding for dense retrieval☆11Updated 7 months ago
- Training hybrid models for dummies.☆20Updated 2 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated this week
- Reasoning by Communicating with Agents☆25Updated 5 months ago
- ☆45Updated 6 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated 2 weeks ago
- ☆48Updated 4 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Code repo for MathAgent☆15Updated last year
- ☆21Updated 2 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 10 months ago
- This repository implements DSPy programs to tasks in Indian Languages☆13Updated last year