SciMT / SciMT-benchmark
☆11Updated last year
Alternatives and similar repositories for SciMT-benchmark:
Users that are interested in SciMT-benchmark are comparing it to the libraries listed below
- SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models☆15Updated 3 months ago
- Structured Chemistry Reasoning with Large Language Models☆32Updated 9 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆21Updated 7 months ago
- PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes [EMNLP 2024]☆23Updated 3 months ago
- Official implementation of paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https://arxiv.or…☆21Updated this week
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated 11 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆67Updated last month
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension☆39Updated 2 months ago
- A trainable user simulator☆34Updated 5 months ago
- Pre-trained Language Model for Scientific Text☆44Updated last year
- ☆25Updated 9 months ago
- ☆13Updated last year
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆44Updated 3 months ago
- ☆20Updated 7 months ago
- Official Implementation of UA^{2}-Agent and other baseline algorithms of "Towards Unified Alignment Between Agents, Humans, and Environme…☆16Updated 3 months ago
- ☆15Updated 6 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆25Updated last week
- ☆16Updated 4 months ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆23Updated 4 months ago
- Official repository for Decentralized Arena via Collective LLM Intelligence☆8Updated 4 months ago
- ☆27Updated 3 months ago
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- ICLR2024 statistics☆47Updated last year
- [ICLR 24 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆17Updated last week
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Applies ROME and MEMIT on Mamba-S4 models☆14Updated 10 months ago
- [AAAI 2024] MELO: Enhancing Model Editing with Neuron-indexed Dynamic LoRA