DistrictDataLabs / baleen
An automated ingestion service for blogs to construct a corpus for NLP research.
โ86Updated 6 years ago
Alternatives and similar repositories for baleen:
Users that are interested in baleen are comparing it to the libraries listed below
- A Topic Modeling toolboxโ92Updated 8 years ago
- ๐ฅ Browser-based slides or PDFs of our talks and presentationsโ94Updated 6 years ago
- ๐ซ Scripts, tools and resources for developing spaCyโ125Updated 5 years ago
- Materials for the workshop Advanced Text Analysis with SpaCy and Scikit-Learn, given at NYU during NYCDH Week 2017, at PyData NYC in Nov.โฆโ82Updated 2 years ago
- A visualisation tool for Spacy using Hierplane.โ65Updated 2 years ago
- Language detection extension for spaCy 2.0+โ112Updated 6 years ago
- For extracting measurements and related entities from textโ57Updated 4 years ago
- Automatic News Corpus Builderโ40Updated 7 years ago
- Multidimensional data explorer and visualization tool.โ55Updated 7 years ago
- ๐คนโโ๏ธ Query spaCy's linguistic annotations using GraphQLโ86Updated 6 years ago
- ๐ซ Jupyter notebooks for spaCy examples and tutorialsโ287Updated 6 years ago
- โ54Updated 6 years ago
- Code and Notebooks for the Natural Language Processing with Python course.โ66Updated 7 years ago
- Excel Integration with spaCy. Training NER using Excel/XLSX from PDF, DOCX, PPT, PNG or JPG.โ105Updated 2 years ago
- Tools, wrappers, etc... for data science with a concentration on text processingโ206Updated 2 years ago
- Similarity search on Wikipedia using gensim in Python.โ60Updated 6 years ago
- Natural Language Processing with Spark's MLlibโ62Updated 7 years ago
- Code & Data for Introduction to Machine Learning with Scikit-Learnโ81Updated 6 years ago
- Material for some talks I have givenโ62Updated 4 months ago
- Data Server for Topic Modelsโ121Updated last year
- Tutorial code and data for the entity resolution workshops.โ43Updated 9 years ago
- An introduction to using spaCy for NLP and machine learningโ191Updated 2 years ago
- A simple command line interface to the datamade/dedupe library.โ42Updated 2 years ago
- Source code for the "Practical Data Science in Python" tutorialโ58Updated 9 years ago
- Memory-based shallow parser for Pythonโ73Updated 5 years ago
- Relatively simple text classification powered by spaCyโ41Updated 9 years ago
- Refinery - A locally deployable open-source web platform for analysis of large document collectionsโ101Updated 8 years ago
- Graph extraction and NLP analysis for Baleen Corporaโ18Updated 8 years ago
- Server/Client around Spacy to load spacy only onceโ46Updated 7 years ago
- Library for Geo-Inferencing in Twitter Dataโ28Updated 8 years ago