davendw49 / sciparser
PDF parsing toolkit for preparing academic text corpus
☆55Updated 8 months ago
Alternatives and similar repositories for sciparser:
Users that are interested in sciparser are comparing it to the libraries listed below
- Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024☆186Updated 9 months ago
- All in one PDF Parser Toolkit☆16Updated last year
- GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.☆50Updated 8 months ago
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆131Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆104Updated last year
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆41Updated 5 months ago
- [EMNLP2024] Aligning Large Language Models on Information Extraction☆45Updated 4 months ago
- LLM for Scientific Research Survey☆74Updated 2 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆78Updated last year
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆53Updated last year
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆46Updated 9 months ago
- Dataset and scripts for HRDoc☆35Updated last year
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆25Updated last year
- ☆53Updated 5 months ago
- ☆36Updated 6 months ago
- 超简单复现Deepseek-R1-Zero和Deepseek-R1,以「24点游戏」 为例。通过zero-RL、SFT以及SFT+RL,以激发LLM的自主验证反思能力。 About Clean, minimal, accessible reproduction of Dee…☆14Updated 3 weeks ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 3 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆48Updated 11 months ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆59Updated last year
- Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"☆52Updated 9 months ago
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Gener…☆60Updated 8 months ago
- ☆23Updated last year
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆65Updated 7 months ago
- This repository contains ScholarQABench data and evaluation pipeline.☆69Updated last month
- Synthetic data generation pipelines for text-rich images.☆50Updated 3 weeks ago
- [Neurips2023] Source code for Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory☆59Updated last year
- Unofficial implementation of AlpaGasus☆90Updated last year
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆41Updated last year