logikon-ai / cot-eval
A framework for evaluating the effectiveness of chain-of-thought reasoning in language models.
☆16Updated 2 months ago
Alternatives and similar repositories for cot-eval:
Users that are interested in cot-eval are comparing it to the libraries listed below
- ☆25Updated last year
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Aioli: A unified optimization framework for language model data mixing☆23Updated 3 months ago
- Minimum Description Length probing for neural network representations☆19Updated 2 months ago
- [ICLR'25] "Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers"☆17Updated 3 weeks ago
- ☆15Updated 2 weeks ago
- official repo of AAAI2024 paper Mitigating the Impact of False Negatives in Dense Retrieval with Contrastive Confidence Regularization☆13Updated last year
- ☆19Updated 2 weeks ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆17Updated last year
- Python library to use Pleias-RAG models☆27Updated this week
- Code for our paper Resources and Evaluations for Multi-Distribution Dense Information Retrieval☆14Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆23Updated this week
- Python package for generating datasets to evaluate reasoning and retrieval of large language models☆17Updated this week
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- ☆24Updated 7 months ago
- ☆49Updated last year
- Code for paper "W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering"☆12Updated this week
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 2 years ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated 3 weeks ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆45Updated 2 weeks ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆16Updated last year
- efficient query encoding for dense retrieval☆11Updated 8 months ago
- Training hybrid models for dummies.☆20Updated 3 months ago
- ☆14Updated 9 months ago
- Efficient and Scalable Estimation of Tool Representations in Vector Space☆23Updated 7 months ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Face☆31Updated last year
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated 5 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated last week
- Nexusflow function call, tool use, and agent benchmarks.☆19Updated 4 months ago
- ☆48Updated 5 months ago