☆25Nov 19, 2025Updated 4 months ago
Alternatives and similar repositories for WildVisualizer
Users that are interested in WildVisualizer are comparing it to the libraries listed below
Sorting:
- ☆49Apr 4, 2025Updated 11 months ago
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- ☆16Sep 4, 2025Updated 6 months ago
- ☆18Mar 5, 2017Updated 9 years ago
- From Word to World: Can Large Language Models be Implicit Text-based World Models?☆50Dec 25, 2025Updated 2 months ago
- https://interactivetraining.ai/☆17Oct 2, 2025Updated 5 months ago
- A Python wrapper for the ROUGE summarization evaluation package☆14Aug 9, 2017Updated 8 years ago
- Auditing agents for fine-tuning safety☆20Oct 21, 2025Updated 4 months ago
- Llemma formal2formal (tactic prediction) theorem proving experiments☆20Oct 17, 2023Updated 2 years ago
- Example formalization of Game Theoretic concepts in Lean☆27Feb 14, 2025Updated last year
- ☆26Sep 3, 2025Updated 6 months ago
- AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents☆66Feb 27, 2026Updated 2 weeks ago
- Distributed LDA, takes raw text as input and outputs topic word table.☆16Apr 16, 2016Updated 9 years ago
- Synthetic data generation for evaluating LLM symbolic and logic reasoning☆22Mar 6, 2026Updated 2 weeks ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆27Mar 6, 2024Updated 2 years ago
- ☆21Mar 18, 2025Updated last year
- ☆19Mar 25, 2025Updated 11 months ago
- Minimal coding, computer-use and deep research agents using the OpenAI Agents SDK☆33Mar 9, 2026Updated last week
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- A long-horizon, sparse-reward math environment for reinforcement learning. Official code repo for "What makes Math problems hard for rein…☆32Aug 11, 2025Updated 7 months ago
- Code for "[COLM'25] RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆23Mar 18, 2025Updated last year
- Official Repository for Paper "BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Compet…☆18Aug 28, 2024Updated last year
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆64Oct 9, 2024Updated last year
- ☆15Jun 7, 2024Updated last year
- ☆30Feb 11, 2022Updated 4 years ago
- Example agents for the Dreadnode platform☆24Dec 19, 2025Updated 3 months ago
- LLM-in-Sandbox: From Coding Agent to General Agent☆205Feb 28, 2026Updated 2 weeks ago
- Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inte…☆22Jun 3, 2024Updated last year
- Official implementation of "Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving"☆29May 8, 2025Updated 10 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆19Nov 12, 2024Updated last year
- ☆47Aug 5, 2025Updated 7 months ago
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆81Nov 12, 2025Updated 4 months ago
- [AAAI'26 Oral] Official Implementation of STAR-1: Safer Alignment of Reasoning LLMs with 1K Data☆33Apr 7, 2025Updated 11 months ago
- ☆18Mar 30, 2025Updated 11 months ago
- [COLING 2025] Official repo of paper: "Not Aligned" is Not "Malicious": Being Careful about Hallucinations of Large Language Models' Jail…☆12Jul 26, 2024Updated last year
- Collections of Actions for Custom GPTs (some created by Captain Action)☆11Jan 7, 2024Updated 2 years ago
- ☆32Sep 11, 2025Updated 6 months ago
- Scripts for medium posts☆26Oct 28, 2019Updated 6 years ago
- Independent robustness evaluation of Improving Alignment and Robustness with Short Circuiting☆18Apr 15, 2025Updated 11 months ago