Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
☆56Aug 6, 2025Updated 7 months ago
Alternatives and similar repositories for SciArena
Users that are interested in SciArena are comparing it to the libraries listed below
Sorting:
- ☆18Sep 15, 2025Updated 5 months ago
- A framework bridging cognitive science and LLM reasoning research to diagnose and improve how large language models reason, based on anal…☆36Nov 26, 2025Updated 3 months ago
- ☆22Aug 21, 2025Updated 6 months ago
- ☆17Jun 8, 2019Updated 6 years ago
- ☆17Feb 12, 2025Updated last year
- Quantum Fast Approximate Synthesis Tool☆19Jan 23, 2023Updated 3 years ago
- ☆52Feb 12, 2025Updated last year
- A quantum circuit optimizer based on sum-over-paths representations☆26Nov 8, 2019Updated 6 years ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Mar 2, 2026Updated last week
- Foam-Agent: An end-to-end, composable multi-agent framework for automating CFD simulations in OpenFOAM. NeurIPS 2025 Machine Learning and…☆118Updated this week
- ☆23Apr 17, 2022Updated 3 years ago
- We study toy models of skill learning.☆32Feb 3, 2026Updated last month
- we have ai at home☆75Feb 18, 2026Updated 2 weeks ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Feb 16, 2026Updated 3 weeks ago
- AgentSynth: Scalable Task Generation for Generalist Computer-Use Agents☆37Oct 7, 2025Updated 5 months ago
- Run Chroma embedded in Swift☆62Feb 18, 2026Updated 2 weeks ago
- Badger code samples☆28May 25, 2020Updated 5 years ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆40Aug 7, 2025Updated 7 months ago
- A comprehensive paper list of Reasoning over Tables.☆30Nov 6, 2022Updated 3 years ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆36Jul 6, 2023Updated 2 years ago
- Python library & CLI to create, view and edit PFB files☆12Feb 19, 2026Updated 2 weeks ago
- EncryCore node reference implementation☆15Apr 2, 2020Updated 5 years ago
- Material associated with Physics Report "Data science applications to string theory"☆11Jun 20, 2023Updated 2 years ago
- A Python toolkit for quantum neural networks.☆44Mar 14, 2019Updated 6 years ago
- ☆43May 29, 2025Updated 9 months ago
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"☆30Updated this week
- Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)☆10Sep 7, 2020Updated 5 years ago
- Evaluation Pipeline for medical tasks.☆12Feb 13, 2026Updated 3 weeks ago
- DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)☆12May 6, 2024Updated last year
- Simple tutorial to get familiar with how to program quantum computers using Qiskit☆11Sep 9, 2019Updated 6 years ago
- The best library in the world to generate PDF from HTML☆13Feb 24, 2026Updated last week
- A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.☆15Dec 20, 2021Updated 4 years ago
- An Advanced Basic Math Reasoning and Overthinking Evaluation Framework for LLMs☆12Jul 8, 2025Updated 8 months ago
- The main controller for services in the cs-insights project through docker-compose.☆13Aug 25, 2023Updated 2 years ago
- The code for the paper "A Bayesian Approach to Online Planning" published in ICML 2024.☆13Jun 17, 2024Updated last year
- ☆16Feb 22, 2025Updated last year
- Sangria akka-streams integration☆11Feb 8, 2026Updated last month
- I saw this [Blog Post](https://www.morling.dev/blog/one-billion-row-challenge/) on a Billion Row challenge for Java so naturally I tried …☆14Jan 10, 2024Updated 2 years ago
- ☆14Mar 21, 2024Updated last year