SciMT / SciMT-benchmark
☆11Updated last year
Alternatives and similar repositories for SciMT-benchmark:
Users that are interested in SciMT-benchmark are comparing it to the libraries listed below
- SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models☆17Updated 5 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]☆25Updated 5 months ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆16Updated 3 weeks ago
- implementation of dualformer☆15Updated last month
- Applies ROME and MEMIT on Mamba-S4 models☆14Updated last year
- Structured Chemistry Reasoning with Large Language Models☆37Updated 11 months ago
- A trainable user simulator☆34Updated 7 months ago
- ☆27Updated last year
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆49Updated 5 months ago
- Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""☆14Updated 8 months ago
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆43Updated 4 months ago
- Pre-trained Language Model for Scientific Text☆45Updated last year
- Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"☆11Updated 2 months ago
- Official implementation of paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https://arxiv.or…☆23Updated 2 months ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆25Updated 10 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆17Updated 5 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆38Updated 3 months ago
- ☆28Updated 2 months ago
- ☆25Updated 11 months ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated last month
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated 7 months ago
- Preparing for ML Interviews.☆11Updated last week
- Official repository for Decentralized Arena via Collective LLM Intelligence☆10Updated 6 months ago
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated last year
- ☆20Updated 2 months ago
- ☆20Updated 4 years ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Updated 10 months ago
- InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery (COLING 2025)☆47Updated 4 months ago