hitachi-nlp / appjsonifyLinks
A handy PDF-to-JSON conversion tool for academic papers implemented in Python.
☆71Updated 2 years ago
Alternatives and similar repositories for appjsonify
Users that are interested in appjsonify are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆101Updated 11 months ago
- [TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Thr…☆32Updated last year
- Aligned, Review-Informed Edits of Scientific Papers☆54Updated 2 years ago
- experiments with inference on llama☆103Updated last year
- Evaluation framework for document processing models and services.☆55Updated last week
- Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their…☆156Updated last month
- Scientific Document Insight Q/A☆31Updated 2 months ago
- DocLLM: A layout-aware generative language model for multimodal document understanding☆129Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- PyLate efficient inference engine☆67Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Resources related to EACL 2023 paper "SwitchPrompt: Learning Domain-Specific Gated Soft Prompts for Classification in Low-Resource Domain…☆52Updated 2 years ago
- ☆46Updated last month
- ☆68Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆51Updated last year
- Mixtral finetuning☆19Updated last year
- Official repository for RAGViz: Diagnose and Visualize Retrieval-Augmented Generation [EMNLP 2024]☆88Updated 10 months ago
- Efficient few-shot learning with cross-encoders.☆59Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…☆44Updated last year
- Model, Code & Data for the EMNLP'23 paper "Making Large Language Models Better Data Creators"☆135Updated 2 years ago
- Code for "Training-free Graph Neural Networks and the Power of Labels as Features" (TMLR 2024)☆57Updated last year
- ☆80Updated last year
- ☆16Updated 2 years ago
- Advanced Reasoning Benchmark Dataset for LLMs☆47Updated 2 years ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆44Updated last year
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆44Updated 8 months ago
- ☆106Updated 3 weeks ago
- Repository containing awesome resources regarding Hugging Face tooling.☆48Updated last year
- minimal pytorch implementation of bm25 (with sparse tensors)☆104Updated 3 weeks ago