MaLeLabTs / RegexGeneratorLinks
This project contains the source code of a tool for generating regular expressions for text extraction: 1. automatically, 2. based only on examples of the desired behavior, 3. without any external hint about how the target regex should look like
☆952Updated 5 years ago
Alternatives and similar repositories for RegexGenerator
Users that are interested in RegexGenerator are comparing it to the libraries listed below
Sorting:
- Code for the paper Neural Generation of Regular Expressions from Natural Language with Minimal Domain Knowledge (EMNLP 2016). http://arxi…☆431Updated 8 years ago
- Compact Language Detector 2☆890Updated 4 years ago
- Keshif - Data Made Explorable (Prototype)☆455Updated 8 years ago
- Fact Extraction from Wikipedia Text☆537Updated 9 years ago
- Natural Language Engine on WikiData☆436Updated 9 years ago
- Index URLs in Common Crawl☆198Updated 8 years ago
- ☆185Updated 7 years ago
- Autocomplete - light-weight, next-word prediction Python utility☆451Updated 3 weeks ago
- Just the facts -- web page content extraction☆1,280Updated 6 months ago
- Creates github index for similar repositories discovery☆192Updated 9 years ago
- Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.☆1,279Updated 5 years ago
- English word segmentation, written in pure-Python, and based on a trillion-word corpus.☆378Updated 3 years ago
- Extract data from websites using basic statistical magic☆505Updated 5 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆73Updated 3 weeks ago
- Chrome extension: Gives Ctrl+F like find results which include non-exact (fuzzy) matches using string edit-distance and GloVe/Word2Vec. A…☆137Updated 5 years ago
- Handwritten math expression parser☆693Updated 5 years ago
- The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploit…☆745Updated 6 years ago
- Cross-platform mouse/keyboard record/replay and automation hotkeys/macros creation, and more advanced automation features.☆1,099Updated 2 years ago
- All-pair set similarity search on millions of sets in Python and on a laptop☆604Updated 3 years ago
- DBpedia Spotlight is a tool for automatically annotating mentions of DBpedia resources in text.☆760Updated 7 years ago
- A lightning fast Finite State machine and REgular expression manipulation library.☆1,882Updated last year
- DeepDive☆1,973Updated 3 years ago
- A bunch of fancy soft string matching routines, with some accompanying datasets☆56Updated 8 years ago
- Document processing for investigations☆250Updated 9 years ago
- Official version of TextTeaser.☆629Updated 7 years ago
- Multilingual word vectors in 78 languages☆1,200Updated 2 years ago
- Adds text to PDF files using the cuneiform OCR software☆328Updated 4 years ago
- Heuristic based boilerplate removal tool☆811Updated 11 months ago
- Python implementation of TextRank algorithm for automatic keyword extraction and summarization using Levenshtein distance as relation bet…☆793Updated 3 years ago
- A python implementation of the Rapid Automatic Keyword Extraction☆983Updated 5 years ago