davendw49 / sciparser
PDF parsing toolkit for preparing academic text corpus
☆54Updated 6 months ago
Alternatives and similar repositories for sciparser:
Users that are interested in sciparser are comparing it to the libraries listed below
- All in one PDF Parser Toolkit☆16Updated last year
- Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024☆180Updated 7 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆77Updated 10 months ago
- GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.☆48Updated 6 months ago
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- [Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆110Updated 6 months ago
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆128Updated 10 months ago
- [EMNLP2024] Aligning Large Language Models on Information Extraction☆37Updated 2 months ago
- Repo for ACL2023 paper "Plug-and-Play Knowledge Injection for Pre-trained Language Models"☆59Updated 9 months ago
- Dataset and scripts for HRDoc☆34Updated last year
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆35Updated 3 months ago
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆100Updated 10 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆210Updated 3 months ago
- ☆137Updated 6 months ago
- TianGong-AI-Unstructure☆56Updated 2 weeks ago
- ☆59Updated last week
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated last year
- Official Repo of paper "KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction". In the paper, we propose …☆65Updated 5 months ago
- ☆78Updated last year
- ☆36Updated 4 months ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆182Updated last week
- ☆52Updated 3 months ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆39Updated 10 months ago
- A curated list of recent and past chart understanding work based on our survey paper: From Pixels to Insights: A Survey on Automatic Char…☆177Updated 5 months ago
- Code implementation of synthetic continued pretraining☆79Updated 2 weeks ago
- ☆136Updated last year
- Codes for our paper "RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation"☆153Updated 5 months ago
- Unofficial implementation of AlpaGasus☆90Updated last year
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆60Updated last year
- [Neurips2023] Source code for Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory☆58Updated last year