logikon-ai / cot-evalLinks
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
☆18Updated 5 months ago
Alternatives and similar repositories for cot-eval
Users that are interested in cot-eval are comparing it to the libraries listed below
Sorting:
- Minimum Description Length probing for neural network representations☆18Updated 5 months ago
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆25Updated 3 months ago
- efficient query encoding for dense retrieval☆11Updated 11 months ago
- This repository contains code and datasets for our paper on the effects of document multiplicity while the context size is fixed in Retri…☆15Updated 4 months ago
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆27Updated last year
- ☆20Updated 3 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Residual Quantization Autoencoder, used for interpreting LLMs☆12Updated 6 months ago
- ☆15Updated 3 months ago
- ☆25Updated last year
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆20Updated last year
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- ☆28Updated last week
- Finding semantically meaningful and accurate prompts.☆47Updated last year
- ☆26Updated last year
- a pipeline for using api calls to agnostically convert unstructured data into structured training data☆30Updated 9 months ago
- PyTorch implementation for MRL☆19Updated last year
- Code for Benchmarking Language Model Agents for Data-Driven Science☆28Updated 8 months ago
- ☆22Updated 3 weeks ago
- Code of fine-tuning neural sparse models and training from scratch. #SIGIR2025☆12Updated this week
- This repository implements DSPy programs to tasks in Indian Languages☆13Updated last year
- Verifiers for LLM Reinforcement Learning☆65Updated 3 months ago
- ☆47Updated 11 months ago
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆18Updated 2 weeks ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆41Updated 5 months ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated last year
- ☆44Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 6 months ago
- ☆22Updated 5 months ago
- Entailment self-training☆25Updated 2 years ago