davendw49 / sciparser
PDF parsing toolkit for preparing academic text corpus
☆49Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for sciparser
- Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024☆171Updated 5 months ago
- All in one PDF Parser Toolkit☆14Updated last year
- GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data.☆46Updated 4 months ago
- A large-scale language model for scientific domain, trained on redpajama arXiv split☆123Updated 8 months ago
- Code and datasets for paper "GeoGalactica: A Scientific Large Language Model in Geoscience"☆19Updated 4 months ago
- The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"☆97Updated 8 months ago
- ☆83Updated 2 weeks ago
- [ACL 2024] OceanGPT: A Large Language Model for Ocean Science Tasks☆33Updated 3 months ago
- Official repository for paper "TableBench: A Comprehensive and Complex Benchmark for Table Question Answering"☆31Updated last month
- ☆40Updated last month
- Two approaches for robust TableQA: 1) ITR is a general-purpose retrieval-based approach for handling long tables in TableQA transformer m…☆33Updated last year
- Dataset and scripts for HRDoc☆34Updated last year
- ☆129Updated 4 months ago
- [Neurips2024] Source code for xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token☆92Updated 4 months ago
- Datasets and Evaluation Scripts for CompHRDoc☆27Updated 7 months ago
- TianGong-AI-Unstructure☆51Updated this week
- Leveraging passage embeddings for efficient listwise reranking with large language models.☆33Updated last month
- PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion☆46Updated 8 months ago
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.☆50Updated last year
- ☆91Updated 11 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆132Updated 5 months ago
- Codes for our paper "RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation"☆133Updated 3 months ago
- ☆22Updated 9 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆57Updated 4 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆31Updated last year
- [Paper][ACL 2024 Findings] Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering☆188Updated 5 months ago
- Code for "A Simple but Effective Approach to Improve Structured Language Model Output for Information Extraction"☆11Updated 8 months ago
- SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)☆67Updated 8 months ago
- [EMNLP2024] Aligning Large Language Models on Information Extraction☆34Updated 2 weeks ago
- A Toolkit for Table-based Question Answering☆105Updated last year