The-Gupta / TED-Scraper
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
☆11Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for TED-Scraper
- An ongoing series of notebooks aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks.☆18Updated 3 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 3 years ago
- Tool for sentiment analysis annotation☆11Updated last month
- NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.☆18Updated 3 years ago
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆11Updated 3 years ago
- Text preprocessing tools in python.☆26Updated 6 years ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- ☆11Updated 2 years ago
- A financial disclosure data extraction tool.☆13Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- A parallel evaluation data set of SAP software documentation with document structure annotation☆10Updated 2 months ago
- Code for "CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection" (V. Blasch…☆9Updated 4 years ago
- Code for the paper "Code-Mixing on Sesame Street: Dawn of the Adversarial Polyglots" (NAACL-HLT 2021)☆10Updated 2 years ago
- Tool for the Automatic Assessment of Lexical Diversity☆11Updated 3 years ago
- GisPy: A Tool for Measuring Gist Inference Score in Text https://aclanthology.org/2022.wnu-1.5/☆11Updated 4 months ago
- List of corpora annotated for coreference for different languages☆17Updated 3 months ago
- CorrectLy - Open Source Spelling & Grammar correction☆38Updated last year
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆9Updated last year
- Discussion Summarization is the process of condensing a text document which is a collection of discussion threads, using CBS (Cluster Bas…☆12Updated 10 years ago
- Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of …☆61Updated 4 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- ☆24Updated 3 years ago
- ☆15Updated 3 years ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Updated 9 months ago
- Documentation effort for the BookCorpus dataset☆31Updated 3 years ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- A summary of must-read papers for Neural Question Generation (NQG)☆14Updated 3 years ago
- Arabic News Stance Corpus☆10Updated 3 years ago