OlehOnyshchak / pyWikiMMLinks
Collects a multimodal dataset of Wikipedia articles and their images
☆16Updated 2 years ago
Alternatives and similar repositories for pyWikiMM
Users that are interested in pyWikiMM are comparing it to the libraries listed below
Sorting:
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 5 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆76Updated last week
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- CLICK ON WIKI LINK BELOW OR ON Wiki TAB AT TOP BANNER FOR DOWNLOAD INSTRUCTIONS AND INFORMATION ON THE NLP SUITE.☆51Updated 2 months ago
- DocAI helps developers quickly build document, image and text processing pipelines using open source and cloud-based machine learning mod…☆20Updated 2 years ago
- Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper☆17Updated 2 years ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Updated 2 years ago
- Boolean text search in Python☆46Updated 4 months ago
- A dataset of multinational first names and last names☆27Updated 2 years ago
- Using Machine Learning to Create Funny Memes☆25Updated 2 years ago
- Extracts iframes or keyframes from a video file, through the command line or from inside python.☆18Updated 3 years ago
- A database of movie scripts from several sources☆181Updated last year
- ☆44Updated 2 years ago
- A tool to easily scrape youtube data using the Google API☆12Updated 7 months ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Daily TV News Summary using GPT☆23Updated 6 months ago
- 🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.☆51Updated 2 weeks ago
- Reproducing "Writing with Transformer" demo, using aitextgen/FastAPI in backend, Quill/React in frontend☆27Updated 4 years ago
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆79Updated 3 years ago
- Deeplearing based Reverse Image Search using Annoy library☆15Updated 6 years ago
- A News Article Collection Library☆22Updated 2 years ago
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆38Updated last year
- A ChatGPT-based tool to decode Jupyter notebooks and the similar python notebook environments like Google Colab, etc.☆63Updated last year
- LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development☆20Updated 2 years ago
- ☆11Updated 2 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆65Updated 10 months ago
- Modelling Big Five Personality Inventory using Machine Learning algorithms☆22Updated last year
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 3 years ago
- A guide to structured generation using constrained decoding☆12Updated last year