davendw49 / sciparser
PDF parsing toolkit for preparing academic text corpus
☆56Updated 10 months ago
Alternatives and similar repositories for sciparser
Users that are interested in sciparser are comparing it to the libraries listed below
Sorting:
- Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024☆192Updated 11 months ago
- All in one PDF Parser Toolkit☆16Updated last year
- GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.☆53Updated 10 months ago
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆105Updated last year
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆133Updated last year
- EMNLP'23 survey: a curation of awesome papers and resources on refreshing large language models (LLMs) without expensive retraining.☆134Updated last year
- [ACL 2023] This is the code repo for our ACL'23 paper "Augmentation-Adapted Retriever Improves Generalization of Language Models as Gener…☆60Updated 10 months ago
- ☆56Updated 6 months ago
- Code and datasets for paper "GeoGalactica: A Scientific Large Language Model in Geoscience"☆32Updated 10 months ago
- ☆97Updated last year
- Two approaches for robust TableQA: 1) ITR is a general-purpose retrieval-based approach for handling long tables in TableQA transformer m…☆38Updated last year
- ☆39Updated 8 months ago
- [NAACL'24] Dataset, code and models for "TableLlama: Towards Open Large Generalist Models for Tables".☆128Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- A Toolkit for Table-based Question Answering☆112Updated last year
- ☆36Updated 8 months ago
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆54Updated last week
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆80Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation☆28Updated last year
- Dataset and scripts for HRDoc☆37Updated last year
- This is the code repo for our paper "Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents".☆106Updated 6 months ago
- ☆81Updated last year
- [EMNLP2024] Aligning Large Language Models on Information Extraction☆46Updated 6 months ago
- Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)☆63Updated last year
- ☆120Updated 10 months ago
- LLM for Scientific Research Survey☆85Updated 3 months ago
- [ACL 2024] IEPile: A Large-Scale Information Extraction Corpus☆193Updated 4 months ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆59Updated last year
- ☆143Updated 10 months ago