☆11Jan 3, 2024Updated 2 years ago
Alternatives and similar repositories for SciMT-benchmark
Users that are interested in SciMT-benchmark are comparing it to the libraries listed below
Sorting:
- SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models☆26Jul 13, 2025Updated 7 months ago
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- SciAssess is a comprehensive benchmark for evaluating Large Language Models' proficiency in scientific literature analysis across various…☆82May 21, 2025Updated 9 months ago
- ☆19Sep 16, 2025Updated 5 months ago
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 10 months ago
- ☆28Nov 10, 2025Updated 3 months ago
- Neuron Activation☆26Nov 21, 2024Updated last year
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- Data from BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology paper☆27Jul 5, 2024Updated last year
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- ☆37Oct 15, 2024Updated last year
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆34Oct 25, 2024Updated last year
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆34Sep 20, 2024Updated last year
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated 2 years ago
- [ICLR 2024 Oral] Improving Convergence and Generalization Using Parameter Symmetries☆31May 29, 2024Updated last year
- ☆37Dec 6, 2024Updated last year
- Concurrency library☆17Oct 13, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- [Neurips 2025]StegoZip: Enhancing Linguistic Steganography Payload in Practice with Large Language Models☆26Dec 4, 2025Updated 3 months ago
- ☆44Jun 21, 2024Updated last year
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- Gender prediction of chinese name based on LSTM☆14Mar 16, 2023Updated 2 years ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 9 months ago
- [AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Underst…☆13Dec 8, 2024Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆51Dec 23, 2024Updated last year
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated 2 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆15Sep 15, 2025Updated 5 months ago
- ☆10Apr 7, 2024Updated last year
- Models for packages and the resources they contain.☆14Mar 10, 2024Updated last year
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- An open-source non-official community implementation of the model from the paper: Surgical Robot Transformer (SRT): Imitation Learning fo…☆11Feb 9, 2026Updated 3 weeks ago
- ☆10Aug 15, 2022Updated 3 years ago
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago