shunk031 / TedScraperLinks
Scraper for TED Talks in Python. Get talk title, transcript, talk topics and so on.
☆15Updated 7 years ago
Alternatives and similar repositories for TedScraper
Users that are interested in TedScraper are comparing it to the libraries listed below
Sorting:
- Automatically exported from code.google.com/p/guess-language☆53Updated last year
- clone of https://code.google.com/p/splitta/ so it can be a git submodule☆34Updated 12 years ago
- FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (inclu…☆64Updated last year
- Scrapes some Finnish word definitions from English Wiktionary.☆8Updated last year
- Convert a corpus of PDF to clean text files on a distributed architecture☆39Updated last year
- Crawling and analyzing data on Wikipedia☆17Updated last year
- Simple natural language parsing and semantic grounding☆10Updated 4 years ago
- Maps clauses from a text corpus onto the metrical structure of a poem☆17Updated 9 years ago
- Multilingual Language Modeling Toolkit☆11Updated 8 years ago
- Tree-adjoining grammar based statistical dependency parser using a general linear model (glm).☆28Updated 8 years ago
- Convert ALTO XML to plain text + minimal metadata☆16Updated 8 months ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- QA-tool for scans with corresponding ALTO-files☆24Updated 2 years ago
- PhiloLogic4☆39Updated 6 months ago
- PDF Extraction Toolkit☆41Updated 4 years ago
- Process Caltech Archives' digital documents and photos, and annotate each page or image with information about its contents☆12Updated 3 years ago
- A comprehensive graph of mathematical domains and topics☆22Updated 3 years ago
- Download, convert and organize Gutenberg books for eBook Readers☆46Updated 5 years ago
- A PDF classifier ensemble with REST API service☆23Updated 4 years ago
- Command-line tool to extract a ranked list of relevant keywords from a corpus with the option of using either topic modeling or tf-idf sc…☆40Updated 8 years ago
- A browser extension providing Open Access bibliographical services☆17Updated 2 years ago
- Analyze and extract Wikipedia article text and attributes and store them into an ElasticSearch index or to json files (multilingual suppo…☆47Updated last year
- an approximate string matching or fuzzy-matching system for spelling correction, normalisation or post-OCR correction☆37Updated 3 months ago
- Specification of NAF, the NLP annotation format☆21Updated 4 years ago
- CLI tool for importing entities from Wikidata / Wikibase☆23Updated 2 years ago
- A simple interface to the Project Gutenberg corpus.☆17Updated 9 years ago
- A set of workflows for corpus building through OCR, post-correction and normalisation☆49Updated 2 years ago
- extract text from ALTO file☆9Updated last year
- A natural language date parser. (Python version of chrono.js)☆25Updated 3 weeks ago
- Use spaCy for NLP and output to the FoLiA XML format.☆12Updated last year