vals-ai / finance-agentLinks
☆14Updated 2 months ago
Alternatives and similar repositories for finance-agent
Users that are interested in finance-agent are comparing it to the libraries listed below
Sorting:
- Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)☆133Updated 2 months ago
- ☆97Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 9 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆102Updated 3 weeks ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated 3 months ago
- Governance of the Commons Simulation (GovSim)☆55Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆173Updated 4 months ago
- Code and Data for "MIRAI: Evaluating LLM Agents for Event Forecasting"☆66Updated last year
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆107Updated 9 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆61Updated 4 months ago
- ☆134Updated 3 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆48Updated 2 weeks ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆71Updated last year
- "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"☆78Updated 3 months ago
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆101Updated last month
- [ACL 2025] Agentic Knowledgeable Self-awareness☆75Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆210Updated this week
- ☆83Updated last year
- Open source interpretability artefacts for R1.☆154Updated 2 months ago
- Official repo for Learning to Reason for Long-Form Story Generation☆65Updated 2 months ago
- ☆92Updated 2 months ago
- Inference-time scaling for LLMs-as-a-judge.☆251Updated this week
- ☆72Updated last year
- We develop benchmarks and analysis tools to evaluate the causal reasoning abilities of LLMs.☆118Updated last year
- ☆126Updated 2 months ago
- Data and code for the Corr2Cause paper (ICLR 2024)☆106Updated last year
- [COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆22Updated this week
- ⚖️ Awesome LLM Judges ⚖️☆107Updated 2 months ago
- A framework for optimizing DSPy programs with RL☆89Updated this week
- Attribute (or cite) statements generated by LLMs back to in-context information.☆245Updated 9 months ago