The-Gupta / TED-Scraper
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
☆11Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for TED-Scraper
- A financial disclosure data extraction tool.☆13Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…☆9Updated 4 years ago
- An ongoing series of notebooks aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks.☆18Updated 3 years ago
- Tool for the Automatic Assessment of Lexical Diversity☆11Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆11Updated 3 years ago
- Example of building a working Spanish-to-English translation model with Marian NMT☆20Updated 4 years ago
- Code examples for Google Natural Language API.☆13Updated 5 years ago
- Tools for scraping YouTube video metadata (mostly for training AI on video titles)☆38Updated 3 years ago
- The stop words list for all languages around the world made by the contributors around the world! Start your contributions now!☆12Updated 2 years ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆9Updated last year
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- Gentle and praatio scripts for easy forced alignment☆18Updated 2 years ago
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12Updated 2 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 3 years ago
- Text preprocessing tools in python.☆26Updated 6 years ago
- Labeled segmentation for the document structure of printed books☆13Updated 7 years ago
- App store search example, using Jina as backend and Streamlit as frontend☆21Updated 2 years ago
- This repository contains papers and resources pertaining to Hate speech research.☆43Updated 3 years ago
- The code processes URLs in an attempt to consolidate different web addresses that point to the same URL and to remove potentially private…☆23Updated 3 years ago
- Deep Neural Networks for audio classification☆11Updated 7 months ago
- ☆15Updated 3 years ago
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- My personal data science blog☆26Updated 3 weeks ago
- Dataset: BuzzFeed News “Trending” Strip, 2018–2023☆19Updated last year
- Codebase for Indic-Transliteration using Seq2Seq RNN. For latest repo with Transformer-based models, check: https://github.com/AI4Bharat/…☆58Updated 3 years ago
- TensorFlow materials☆13Updated 3 years ago