uchidalab / book-dataset
This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.
☆248Updated 4 years ago
Alternatives and similar repositories for book-dataset:
Users that are interested in book-dataset are comparing it to the libraries listed below
- Classification of books based on titles without prior knowledge of context or author☆59Updated 2 years ago
- ☆138Updated last year
- This repo contains code to convert Structured Documents to Graphs and implement a Graph Convolution Neural Network for node classificatio…☆144Updated 2 years ago
- shabeelkandi / Handling-Out-of-Vocabulary-Words-in-Natural-Language-Processing-using-Language-Modelling☆69Updated 5 years ago
- This is the code for the "How to Make Word Vectors from Game of Thrones (LIVE) " Siraj Raval on Youtube☆171Updated 5 years ago
- State-of-the-Art Language Modeling and Text Classification in Hindi Language☆220Updated 6 years ago
- Toolbox for OCR post-correction☆121Updated 5 years ago
- 💙 Emoji handling and meta data for spaCy with custom extension attributes☆181Updated last year
- Key information extraction from text and graph visualization☆91Updated 4 years ago
- A repository with anonymized invoices☆12Updated 6 years ago
- ☆91Updated 8 years ago
- ☆130Updated 3 years ago
- GloVe word vector embedding experiments (similar to Word2Vec)☆66Updated last year
- Implementation of DocFormer: End-to-End Transformer for Document Understanding, a multi-modal transformer based architecture for the task…☆269Updated 2 years ago
- A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.☆440Updated last year
- Generate realistic Instagram captions using transformers 🤗☆103Updated last year
- Storage and retrieval of Word Embeddings in various databases☆51Updated 6 years ago
- Companion code to the paper "Extracting Scientific Figures with Distantly Supervised Neural Networks" 🤖☆138Updated 2 years ago
- AI poetic imagery☆40Updated last year
- The Dakshina dataset is a collection of text in both Latin and native scripts for 12 South Asian languages. For each language, the datase…☆193Updated 4 years ago
- Indic-BERT-v1: BERT-based Multilingual Model for 11 Indic Languages and Indian-English. For latest Indic-BERT v2, check: https://github.c…☆281Updated last year
- Word Embeddings for Information Retrieval☆225Updated last year
- Character-based word embeddings model based on RNN for handling real world texts☆173Updated last year
- NLP in Python with Deep Learning☆577Updated last year
- ☆159Updated 2 years ago
- A Pytorch Deep Dream Implementation☆88Updated 4 years ago
- The project aims on adding a state-of-the-art transliteration module for cross transliterations among all Indian languages including Engl…☆265Updated 2 years ago
- ✔️Contextual word checker for better suggestions (not actively maintained)☆413Updated last month
- Language model fine-tuning on NER with an easy interface and cross-domain evaluation. "T-NER: An All-Round Python Library for Transformer…☆386Updated last year
- Exploring word2vec embeddings as a graph of nearest neighbors☆709Updated 4 years ago