SciMT / SciMT-benchmarkView external linksLinks
☆11Jan 3, 2024Updated 2 years ago
Alternatives and similar repositories for SciMT-benchmark
Users that are interested in SciMT-benchmark are comparing it to the libraries listed below
Sorting:
- SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models☆26Jul 13, 2025Updated 7 months ago
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- SciAssess is a comprehensive benchmark for evaluating Large Language Models' proficiency in scientific literature analysis across various…☆83May 21, 2025Updated 8 months ago
- ☆19Sep 16, 2025Updated 4 months ago
- An evaluation suite for Retrieval-Augmented Generation (RAG).☆23Apr 26, 2025Updated 9 months ago
- ☆28Nov 10, 2025Updated 3 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- Neuron Activation☆26Nov 21, 2024Updated last year
- EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions☆25May 30, 2024Updated last year
- Data from BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology paper☆27Jul 5, 2024Updated last year
- Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs [EMNLP 2023 Findings]☆24Nov 18, 2023Updated 2 years ago
- ☆37Oct 15, 2024Updated last year
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆34Sep 20, 2024Updated last year
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆34Oct 25, 2024Updated last year
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated 11 months ago
- [Neurips 2025]StegoZip: Enhancing Linguistic Steganography Payload in Practice with Large Language Models☆24Dec 4, 2025Updated 2 months ago
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆30Mar 2, 2024Updated last year
- [ICLR 2024 Oral] Improving Convergence and Generalization Using Parameter Symmetries☆31May 29, 2024Updated last year
- ☆37Dec 6, 2024Updated last year
- ☆11Dec 23, 2024Updated last year
- Concurrency library☆16Oct 13, 2024Updated last year
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Oct 14, 2024Updated last year
- ☆44Jun 21, 2024Updated last year
- CANdle - a library for using USB-FDCAN dongle and communicating with md80 drives☆13Sep 15, 2025Updated 5 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Dec 23, 2024Updated last year
- Material parsers and other tools, scripts Initially developed for Grobid Superconductor☆13Feb 21, 2025Updated 11 months ago
- NeRF implementation with minimal code and maximal readability using PyTorch☆11Aug 27, 2022Updated 3 years ago
- An active inference model of Lacanian psychoanalysis☆15Jun 7, 2025Updated 8 months ago
- ☆10Aug 15, 2022Updated 3 years ago
- Models for packages and the resources they contain.☆14Mar 10, 2024Updated last year
- Develop C++/CUDA extensions with PyTorch like Python scripts☆10Jan 7, 2026Updated last month
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Original VinVL visual backbone with simplified APIs to easily extract features, boxes, object detections, in a few lines of Python code.☆11Nov 27, 2022Updated 3 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- Gender prediction of chinese name based on LSTM☆14Mar 16, 2023Updated 2 years ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated 11 months ago
- ☆12Jan 11, 2026Updated last month