hitachi-nlp / appjsonifyLinks
A handy PDF-to-JSON conversion tool for academic papers implemented in Python.
☆71Updated 2 years ago
Alternatives and similar repositories for appjsonify
Users that are interested in appjsonify are comparing it to the libraries listed below
Sorting:
- [TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Thr…☆32Updated last week
- Evaluation framework for document processing models and services.☆58Updated this week
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Aligned, Review-Informed Edits of Scientific Papers☆54Updated 2 years ago
- Scientific Document Insight Q/A☆32Updated 3 months ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Download, parse, and filter data from Phil Papers. Data-ready for The-Pile.☆18Updated 2 years ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆131Updated last year
- ☆110Updated last month
- Efficient few-shot learning with cross-encoders.☆60Updated last year
- ☆17Updated last year
- TF-ID: Table/Figure IDentifier for academic papers☆241Updated last year
- multimodal document analysis☆166Updated last month
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆137Updated 2 years ago
- Interact with the Deep Search platform for new knowledge explorations and discoveries☆220Updated 10 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆46Updated 8 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆88Updated 10 months ago
- Repository for deepdoctection tutorial notebooks☆48Updated 5 months ago
- Official repo of Respond-and-Respond: data, code, and evaluation☆104Updated last year
- Create fast graph language models from converted PDF documents for knowledge extraction and Q&A.☆57Updated 10 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆45Updated last year
- A Python library to chunk/group your texts based on semantic similarity.☆101Updated last year
- Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.☆250Updated 10 months ago
- Evaluation of bm42 sparse indexing algorithm☆72Updated last year
- Resources related to EACL 2023 paper "SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domain…☆52Updated 2 years ago
- ☆71Updated last month
- Logical structure analysis for visually structured documents☆94Updated 3 years ago
- Explore the use of DSPy for extracting features from PDFs 🔎☆49Updated last year