snap-stanford / stark
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
☆282Updated last month
Related projects: ⓘ
- AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval (https://arxiv.org/abs/2406.11200)☆140Updated last month
- Benchmarking LLMs via Uncertainty Quantification☆206Updated 7 months ago
- Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.☆276Updated last year
- A recipe for online RLHF.☆376Updated 3 weeks ago
- LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Les…☆168Updated 3 months ago
- The official implementation of Self-Play Preference Optimization (SPPO)☆461Updated last month
- Grimoire is All You Need for Enhancing Large Language Models☆115Updated 6 months ago
- Pytorch Library for Relational Table Learning with LLMs.☆270Updated last week
- An interpretable large language model (LLM) for medical diagnosis.☆68Updated last week
- (ACL 24 main) Large Language Models Can Learn Temporal Reasoning☆31Updated 3 weeks ago
- TxBKG - Knowledge Graph Generation for Any PDFs☆224Updated 9 months ago
- We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …☆115Updated last year
- A curated list of awesome leaderboard-oriented resources for foundation models☆183Updated this week
- The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://a…☆350Updated last week
- MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models☆372Updated 7 months ago
- Recipes to train reward model for RLHF.☆634Updated last week
- Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning☆208Updated last week
- [PGAI@CIKM 2023] PyTorch Implementation of LlamaRec: Two-Stage Recommendation using Large Language Models for Ranking☆130Updated 4 months ago
- Repository for G-Retriever☆275Updated 2 months ago
- The Effect of Metadata on Scientific Literature Tagging: A Cross-Field Cross-Model Study (WWW'23)☆76Updated last year
- WorldGPT: Empowering LLM as Multimodal World Model☆116Updated last month
- [ICLR'24] Enhancing Healthcare Predictions with Personalized Knowledge Graphs☆154Updated 5 months ago
- ☆168Updated 2 months ago
- This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tas…☆60Updated 3 weeks ago
- AAGPT is another experimental open-source application showcasing the capabilities of large language models, such as GPT-3.5 and GPT-4.☆154Updated last year
- [ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and …☆107Updated last month
- [ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA☆176Updated 3 weeks ago
- A Comprehensive Benchmark for Code Information Retrieval.☆61Updated last week
- A deployment, monitoring and autoscaling service towards serverless LLM serving.☆152Updated last week
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆203Updated 2 weeks ago