davendw49 / sciparser
PDF parsing toolkit for preparing academic text corpus
☆49Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for sciparser
- Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024☆170Updated 5 months ago
- All in one PDF Parser Toolkit☆14Updated last year
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆122Updated 8 months ago
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆97Updated 7 months ago
- ☆129Updated 4 months ago
- TianGong-AI-Unstructure☆51Updated this week
- ☆37Updated 3 weeks ago
- Code and datasets for paper "GeoGalactica: A Scientific Large Language Model in Geoscience"☆19Updated 4 months ago
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆45Updated 8 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆131Updated 4 months ago
- Repo for ACL2023 paper "Plug-and-Play Knowledge Injection for Pre-trained Language Models"☆57Updated 7 months ago
- ☆34Updated 2 months ago
- GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.☆45Updated 4 months ago
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆30Updated 3 weeks ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆29Updated 10 months ago
- [Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆90Updated 4 months ago
- Code implementation of synthetic continued pretraining☆54Updated last month
- ☆69Updated this week
- Codes for our paper "RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation"☆124Updated 2 months ago
- ☆132Updated last year
- Two approaches for robust TableQA: 1) ITR is a general-purpose retrieval-based approach for handling long tables in TableQA transformer m…☆33Updated last year
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆168Updated this week
- Dataset and scripts for HRDoc☆31Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- ☆22Updated 9 months ago
- A Toolkit for Table-based Question Answering☆105Updated last year
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆46Updated last year
- ☆36Updated 3 weeks ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆61Updated last year
- [ACL'23 Findings] "Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot Relation Extractors"☆36Updated 10 months ago