OlehOnyshchak / pyWikiMMLinks
Collects a multimodal dataset of Wikipedia articles and their images
☆16Updated 2 years ago
Alternatives and similar repositories for pyWikiMM
Users that are interested in pyWikiMM are comparing it to the libraries listed below
Sorting:
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- A tool to easily scrape youtube data using the Google API☆12Updated 9 months ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 7 years ago
- Influencer dataset collected from Instagram☆125Updated 3 years ago
- A database of movie scripts from several sources☆182Updated last year
- Chat with an AI simulation of anyone as easily as copy-pasting text into a folder!☆19Updated 2 years ago
- CLICK ON WIKI LINK BELOW OR ON Wiki TAB AT TOP BANNER FOR DOWNLOAD INSTRUCTIONS AND INFORMATION ON THE NLP SUITE.☆51Updated 4 months ago
- This repository contains code used for our Multi Sentence Inference NAACL'22 paper.☆12Updated 2 years ago
- Legal document similarity - Code, data, and models for the ICAIL 2021 paper "Evaluating Document Representations for Content-based Legal …☆32Updated 4 years ago
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆65Updated 11 months ago
- Experiments for automated personality detection using Language Models and psycholinguistic features on various famous personality dataset…☆204Updated 10 months ago
- Adversarial Training on Transformer Networks to discover check-worthy factual claims☆83Updated 2 years ago
- Python API & command-line tool to easily transcribe speech-based video files into clean text☆218Updated last year
- Domain-Specific Text Generation for Machine Translation (with LLMs) - scripts and config files for the paper☆18Updated 2 years ago
- Using Machine Learning to Create Funny Memes☆25Updated 2 years ago
- RaKUn 2.0 - A fast keyword detection algorithm☆69Updated 5 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆53Updated 9 months ago
- GitHub action that'll sync files from a GitHub Repo with the Hugging Face Hub 🤗☆78Updated last year
- Modelling Big Five Personality Inventory using Machine Learning algorithms☆22Updated last year
- ☆56Updated 2 years ago
- Social Media Mining Toolkit (SMMT) main repository☆136Updated 3 years ago
- Tool to create image datasets for machine learning problems by scraping search engines like Google, Bing and Baidu.☆17Updated 6 years ago
- Concept Modeling: Topic Modeling on Images and Text☆217Updated last year
- an experimental implementation of Burrow's delta in Python 3☆21Updated 4 years ago
- A collection of datasets and other resources for legal text processing.☆160Updated 2 months ago
- Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k se…☆156Updated 5 months ago
- Code for constructing TLDR corpus from Reddit dataset☆27Updated 4 years ago
- Data and code related to the report "Truth, Lies, and Automation: How Language Models Could Change Disinformation"☆28Updated 4 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆80Updated last week
- A dataset for pretraining language models targeted for legal tasks.☆140Updated 3 years ago