shunk031 / TedScraper
Scraper for TED Talks in Python. Get talk title, transcript, talk topics and so on.
☆15Updated 7 years ago
Alternatives and similar repositories for TedScraper
Users that are interested in TedScraper are comparing it to the libraries listed below
Sorting:
- extract text from ALTO file☆9Updated last year
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆63Updated last year
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- Convert ALTO XML to plain text + minimal metadata☆16Updated 7 months ago
- Code for recon16 hack day☆16Updated 7 years ago
- A selection of test lines of several early printed books as well as the corresponding individual OCRopus models and mixed models.☆10Updated 7 years ago
- CLI tool for importing entities from Wikidata / Wikibase☆23Updated 2 years ago
- A web application for exploring documents topically.☆26Updated 8 years ago
- Wikidata authority file mapping tool☆11Updated 6 years ago
- A PDF collection reader with built-in full-text search engine☆19Updated 7 years ago
- Multi-Entity Extraction Framework for Academic Documents (with default extraction tools)☆31Updated last year
- ☆14Updated 3 years ago
- Python API for KB data-services☆19Updated 5 years ago
- ☆40Updated 7 years ago
- Finds linguistic patterns effortlessly☆36Updated last year
- Multilingual Language Modeling Toolkit☆11Updated 7 years ago
- Recipes for training OpenNMT systems☆14Updated 7 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Stylometric framework in Python☆17Updated 10 years ago
- Source for lemon-model.net☆11Updated 3 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆12Updated 4 years ago
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- clone of https://code.google.com/p/splitta/ so it can be a git submodule☆34Updated 11 years ago
- A search engine built on the Unpaywall database☆20Updated last year
- WordNet behind a REST interface☆13Updated last month
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Command-line corpus tools☆9Updated 8 years ago
- Simple spaCy-based concept extraction API, involving a dictionary of relevant concepts.☆10Updated 6 years ago
- Entity linker for the newspaper collection of the National Library of the Netherlands. Links named entity mentions to DBpedia description…☆11Updated 2 years ago
- Easily identify and label sentence intervals using various taggers.☆16Updated 8 years ago