ayulockin / llm-eval-sweepView external linksLinks
A simple repository showcasing a few LLM Evaluation strategies and leverages W&B Sweeps to optimize the LLM system.
☆12Jul 11, 2023Updated 2 years ago
Alternatives and similar repositories for llm-eval-sweep
Users that are interested in llm-eval-sweep are comparing it to the libraries listed below
Sorting:
- ☆10Nov 7, 2022Updated 3 years ago
- Scrapy抓取豆瓣图书☆10Aug 19, 2016Updated 9 years ago
- Detect-Then-Explain Framework for Text-to-SQL task☆10Dec 6, 2023Updated 2 years ago
- Repo for our work "Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence"☆19Jun 2, 2025Updated 8 months ago
- 华中科技大学研究生课程论文LaTeX模板☆11Aug 5, 2022Updated 3 years ago
- Demonstrate using MCP with Pydantic AI framework☆14Mar 14, 2025Updated 11 months ago
- User Management Application build with Spring Boot, Thymeleaf & MySQL Database☆12Dec 20, 2024Updated last year
- ☆11May 18, 2022Updated 3 years ago
- Repository for KDA(Knowledge-dependent Answerability), EMNLP 2022 work☆13Feb 27, 2023Updated 2 years ago
- [Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.☆14Jun 7, 2023Updated 2 years ago
- Official repository for "DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation (ACL2023 Findings)"☆11May 23, 2023Updated 2 years ago
- Estimates fatigue loads in wind turbines from SCADA data based on supervised learning.☆10Sep 11, 2018Updated 7 years ago
- Reinforcement Learning Robot avoiding obstacles(Python + V_rep)☆12Oct 29, 2019Updated 6 years ago
- [NeurIPS 2025] This is the official repository for "RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis"☆26Nov 21, 2025Updated 2 months ago
- Official source code for Time is Not Enough: Time-Frequency based Explanation for Time-Series Black-Box Models☆12Dec 5, 2024Updated last year
- ☆14Jan 6, 2025Updated last year
- incremental symbol learning for natural language understanding☆10Jun 12, 2023Updated 2 years ago
- ☆10May 14, 2020Updated 5 years ago
- Code for the AACL 2022 Paper "This Patient Looks Like That Patient: Prototypical Networks for Interpretable Diagnosis Prediction from Cli…☆12Nov 18, 2022Updated 3 years ago
- Multi-hop Evidence Retrieval for Cross-document Relation Extraction☆11Sep 1, 2023Updated 2 years ago
- ☆12Oct 3, 2023Updated 2 years ago
- Hierarchical reinforcement learning framework which uses a directed graph to define the hierarchy.☆14Aug 5, 2022Updated 3 years ago
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".☆15Apr 27, 2023Updated 2 years ago
- A Python Natural Language Processing Toolkit for Electronic Health Record Texts☆13May 24, 2023Updated 2 years ago
- ☆11Jun 21, 2025Updated 7 months ago
- Implementation of "Face detection in untrained deep neural networks" (Baek et al., Nature Communications, 2021)☆10Nov 2, 2021Updated 4 years ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery☆20Sep 24, 2025Updated 4 months ago
- Pipeline for employing a Lightweight deep learning models for LOW-power systems☆11Jan 9, 2023Updated 3 years ago
- Javascript based component for highlighting text-mined annotations of different semantic types in a full text article identified by a PMC…☆11Nov 29, 2016Updated 9 years ago
- Official repository for Fourier model that can generate periodic signals☆10Mar 10, 2022Updated 3 years ago
- Large-language Model Evaluation framework with Elo Leaderboard and A-B testing☆52Oct 24, 2024Updated last year
- Explore and Control with Adversarial Surprise☆10Jul 20, 2021Updated 4 years ago
- Basic openAI chat Bot on neo4j knowledge graph☆12Oct 4, 2023Updated 2 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated 10 months ago
- Evaluating and improving the faithfulness of the interpretations offered by Neural Module Networks☆13Jun 12, 2023Updated 2 years ago
- Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs☆14Updated this week
- 🩻 NV-Reason-CXR-3B is a specialized vision-language model designed for medical reasoning and interpretation of chest X-ray images.☆39Oct 29, 2025Updated 3 months ago
- Official code for "Federated learning for heterogeneous electronic health record systems with cost effective participant selection"☆12Updated this week
- Minimal implementation of multiple PEFT methods for LLaMA fine-tuning☆13May 7, 2023Updated 2 years ago