The-Gupta / TED-Scraper
Complete Web Scraping of TED.com for Metadata, Transcript, Audio, Video, Images using Parallel Programming
☆11Updated 4 years ago
Alternatives and similar repositories for TED-Scraper:
Users that are interested in TED-Scraper are comparing it to the libraries listed below
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Updated 2 years ago
- Example of building a working Spanish-to-English translation model with Marian NMT☆20Updated 4 years ago
- ☆13Updated last year
- This is a document concerning Data Readiness in the context of machine learning and Natural Language Processing.☆11Updated 3 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 3 years ago
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12Updated 2 years ago
- A financial disclosure data extraction tool.☆13Updated last year
- Tools for scraping YouTube video metadata (mostly for training AI on video titles)☆39Updated 3 years ago
- An ongoing series of notebooks aimed at helping fellow NLP enthusiasts think about applying new tools and techniques to practical tasks.☆18Updated 4 years ago
- Text classification automl☆21Updated 3 years ago
- Text preprocessing tools in python.☆26Updated 6 years ago
- The Seshat audio annotation management platform☆13Updated 4 years ago
- Dataiku DSS plugin to detect languages, correct misspellings, and clean text data 🧼☆23Updated this week
- GisPy: A Tool for Measuring Gist Inference Score in Text https://aclanthology.org/2022.wnu-1.5/☆11Updated 6 months ago
- A Python toolkit to generate a tokenized dump of Wikipedia for NLP☆11Updated 8 months ago
- Find duplicate text files.☆12Updated this week
- Code for publications related to longitudinal dialog research.☆10Updated 5 years ago
- PromptCraft is a prompt perturbation toolkit from the character, word, and sentence levels for prompt robustness analysis. PyPI Package: …☆14Updated last year
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.☆18Updated 3 years ago
- Training a model without a dataset for natural language inference (NLI)☆25Updated 4 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆32Updated last year
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- This repository implements the interaction with DBLP, information extraction and pre-processing of papers, and a client to store data to …☆10Updated 2 years ago
- NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data☆10Updated 3 years ago
- Post-processing OCR errors with seq2seq models☆28Updated 4 years ago
- Scripts to take hand washing related text in (almost) any language and float it into a hand washing poster.☆9Updated 3 years ago
- [WIP] Behold, semantic-search, built over sentence-transformers to make it easy for search engineers to evaluate, optimise and deploy mod…☆15Updated last year
- TTS Client for Coqui TTS server☆13Updated 2 years ago
- Code for "CyberWallE at SemEval-2020 Task 11: An Analysis of Feature Engineering for Ensemble Models for Propaganda Detection" (V. Blasch…☆9Updated 4 years ago