dell-research-harvard / HJDatasetView external linksLinks
A Large Dataset of Historical Japanese Documents with Complex Layouts
☆35Jul 22, 2022Updated 3 years ago
Alternatives and similar repositories for HJDataset
Users that are interested in HJDataset are comparing it to the libraries listed below
Sorting:
- Convert Transkribus PAGE-XML to standard PAGE-XML☆12Dec 10, 2025Updated 2 months ago
- Small collection of PAGE XML related scripts used at the ZPD Würzburg☆12Aug 2, 2024Updated last year
- ☆31Dec 18, 2025Updated last month
- Hadwritten Text Recognition in Few-shot Scenario☆22Mar 25, 2023Updated 2 years ago
- Miqra According to the Masorah in two JSON formats☆12Jan 16, 2026Updated 3 weeks ago
- Data Programming for Text Detection in Documents using SPEAR☆12Mar 26, 2025Updated 10 months ago
- ☆10Nov 19, 2020Updated 5 years ago
- ☆11Jun 24, 2022Updated 3 years ago
- ☆16Jun 3, 2025Updated 8 months ago
- ☆92Dec 8, 2022Updated 3 years ago
- [ACM MM 2020] Exploring Font-independent Features for Scene Text Recognition☆44Nov 30, 2020Updated 5 years ago
- code for "Combating Noise: Semi-supervised Learning by Region Uncertainty Quantification"☆10Mar 19, 2022Updated 3 years ago
- Tools for normalizing the use of some characters and checking file consistencies☆11Jan 12, 2026Updated last month
- "All my notes are belong to you" 🤖☆13Jan 5, 2023Updated 3 years ago
- MathNet: A Data-Centric Approach, Dataset and Benchmark Model to Advance Mathematical Expression Recognition☆10Mar 19, 2025Updated 10 months ago
- ☆10Nov 21, 2023Updated 2 years ago
- NextJS application to upload an image, extract the open pose and edit the keypoints.☆11Oct 4, 2023Updated 2 years ago
- T22_034_han_shi_hao_CRDDC_2022_SourceCode☆11Dec 29, 2023Updated 2 years ago
- ☆11Dec 9, 2020Updated 5 years ago
- version 4.x of the Princeton Geniza Project☆12Updated this week
- Interesting Public Datasets☆12Apr 28, 2023Updated 2 years ago
- convert PubLayNet data into METS/PAGE-XML☆10Mar 17, 2020Updated 5 years ago
- ☆12Sep 6, 2023Updated 2 years ago
- This repo contains a demo of adversarial strings poisoning vector database and forching specific hallucinations on RAG chatbot.☆10May 2, 2024Updated last year
- Built a Data Pipeline for a Retail store using AWS services that collects data from its transactional database (OLTP) in Snowflake and tr…☆11May 25, 2023Updated 2 years ago
- The implementations of some works from Davar-Lab. Currently we have the code of Text Perceptron (AAAI 2020). Some works' code will be pub…☆11Mar 26, 2021Updated 4 years ago
- ☆10Aug 20, 2025Updated 5 months ago
- A corpus of diacritized Hebrew texts (טקסט מנוקד)☆11May 4, 2022Updated 3 years ago
- Repository for "CoMix: Comprehensive Benchmark for Multi-Task Comic Understanding"☆16Nov 20, 2024Updated last year
- DeepNC: Deep Generative Network Completion☆10Dec 1, 2020Updated 5 years ago
- Code for An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality (ICLR 2020)☆11Mar 24, 2023Updated 2 years ago
- Digital texts in Prakrit☆10Sep 14, 2025Updated 5 months ago
- Better dependency caching than Github's own cache action☆13Feb 25, 2025Updated 11 months ago
- Mirror of https://gerrit.wikimedia.org/g/mediawiki/services/jobrunner/☆11Jan 21, 2026Updated 3 weeks ago
- This repository contain the implementation of DANIEL. (A fast Document Attention Network for Information Extraction and Labeling of handw…☆20Jan 12, 2026Updated last month
- Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…☆12Oct 21, 2022Updated 3 years ago
- Classification, Object Detection, Adversarial Attack of Chinese Traffic Signs // 中式交通标志图片的分类、目标检测、对抗性攻击☆10Aug 12, 2020Updated 5 years ago
- A complete framework for training Large Language Models from scratch☆19Jan 8, 2026Updated last month
- My Solution to Assignments of CS234(Stanford / Fall 2019)☆15Sep 3, 2020Updated 5 years ago