sam-paech / diplobenchLinks
Benchmark for LLMs playing full press diplomacy
☆57Updated 10 months ago
Alternatives and similar repositories for diplobench
Users that are interested in diplobench are comparing it to the libraries listed below
Sorting:
- ☆67Updated 6 months ago
- An introduction to LLM Sampling☆79Updated last year
- ☆125Updated last year
- A framework for orchestrating AI agents using a mermaid graph☆77Updated last year
- Train your own SOTA deductive reasoning model☆107Updated 10 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Updated this week
- ☆131Updated last year
- Inference-time scaling for LLMs-as-a-judge.☆320Updated 2 months ago
- ☆189Updated last year
- look how they massacred my boy☆63Updated last year
- Plotting (entropy, varentropy) for small LMs☆99Updated 7 months ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆454Updated last year
- smol models are fun too☆93Updated last year
- SoTA Approach for ARC-AGI 2☆156Updated 3 months ago
- ☆160Updated last year
- PageRank for LLMs☆51Updated 4 months ago
- An agent orchestration framework for economic agents☆110Updated 4 months ago
- Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their…☆159Updated 2 months ago
- Simple UI for debugging correlations of text embeddings☆306Updated 7 months ago
- ☆104Updated 5 months ago
- A strongly typed Python DSL for developing message passing multi agent systems☆53Updated last year
- Train an adapter for any embedding model in under a minute☆130Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆107Updated 3 months ago
- smolLM with Entropix sampler on pytorch☆149Updated last year
- ☆68Updated 7 months ago
- 📝 Reference-Free automatic summarization evaluation with potential hallucination detection☆103Updated last year
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆32Updated 3 months ago
- A library for benchmarking the Long Term Memory and Continual learning capabilities of LLM based agents. With all the tests and code you…☆82Updated last year
- Draw more samples☆198Updated last year
- explore token trajectory trees on instruct and base models☆150Updated 7 months ago