citiususc / pyplexity
Cleaning tool for web scraped text
☆39Updated last year
Alternatives and similar repositories for pyplexity:
Users that are interested in pyplexity are comparing it to the libraries listed below
- Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts, powered by AREkit☆38Updated 2 months ago
- Generate a SQLite database from Wikipedia & Wikidata dumps.☆33Updated last year
- Finds linguistic patterns effortlessly☆35Updated last year
- LLM plugin for clustering embeddings☆72Updated last year
- An Infr app that automates data collection from your PC, macOS or Linux client.☆11Updated last year
- Factored Cognition Primer: How to write compositional language model programs☆48Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- Local emulator for Hugging Face Inference Endpoints customer handlers☆25Updated last year
- spaCy entry points for Curated Transformers☆27Updated 6 months ago
- ☆29Updated last year
- Embedding models from Jina AI☆58Updated last year
- Python package that offers text scrubbing functionality, providing building blocks for string cleaning as well as normalizing geographica…☆22Updated 7 months ago
- Simple, Fast, Parallel Huggingface GGML model downloader written in python☆24Updated last year
- Minimalist Context Management for message-based GPTs☆22Updated last year
- Documentation effort for the BookCorpus dataset☆34Updated 3 years ago
- With sequence-learn, you can build models for named entity recognition as quickly as if you were building a sklearn classifier.☆22Updated 2 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated 10 months ago
- Python code for building a GPT-3 based technical blog post optimizer.☆84Updated 2 years ago
- ☆30Updated 2 years ago
- With embedders, you can easily convert your texts into sentence- or token-level embeddings within a few lines of code. Use cases for this…☆21Updated last year
- [Added T5 support to TRLX] A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆47Updated 2 years ago
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆13Updated last year
- Open source library for few shot NLP☆78Updated last year
- Aim-spaCy integration☆34Updated last year
- Run semantic queries over your twitter history☆39Updated 2 years ago
- A text analysis library for relevance and subtheme detection☆16Updated last month
- Source code and data for Like a Good Nearest Neighbor☆28Updated 2 months ago
- Summarize the top 30 most popular arXiv papers on Reddit, Hacker News and Hugging Face in the last 30 days. And post them to Slack, Twitt…☆17Updated last month
- Codebase topic modeling using GNNs(Node aggregation and clustering)☆61Updated last year
- Vespa application making an index of the CORD-19 dataset.☆39Updated 2 months ago