networkdynamics / seldoniteLinks
A News Article Collection Library
☆22Updated 2 years ago
Alternatives and similar repositories for seldonite
Users that are interested in seldonite are comparing it to the libraries listed below
Sorting:
- Tools for interactive visual exploration of semantic embeddings.☆38Updated last year
- Various Jupyter notebooks about Common Crawl data☆58Updated 6 months ago
- News API - fetch news from CommonCrawl, parse with NewsPlease, enrich with pre-trained machine-learning models, to structured searchable …☆29Updated 3 years ago
- ☆67Updated last year
- LLM plugin for clustering embeddings☆82Updated last year
- Next-generation Punkt sentence boundary detection with zero dependencies☆18Updated 2 months ago
- RaKUn 2.0 - A fast keyword detection algorithm☆68Updated 2 months ago
- Tools to construct and process Common Crawl webgraphs☆98Updated this week
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆39Updated 6 years ago
- A dataset for pretraining language models targeted for legal tasks.☆138Updated 3 years ago
- 💫 SpaCy wrapper for ConceptNet 💫☆95Updated 2 years ago
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆148Updated 9 months ago
- Production-grade embedding generation, for any length of text, for transformer models.☆23Updated 3 months ago
- ☆55Updated last year
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Common crawl extractor☆80Updated last year
- Completion After Prompt Probability. Make your LLM make a choice☆80Updated 11 months ago
- This repository contains an easy and intuitive approach to use SetFit in combination with spaCy.☆80Updated 2 years ago
- TextReducer - A Tool for Summarization and Information Extraction☆88Updated last year
- Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and processing large text collections with ML and for ML☆63Updated 8 months ago
- This repository serves as a collection of scrapers procuring and structuring various legal datasets☆18Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- 📚 Datasets and models for instruction-tuning☆239Updated 2 years ago
- An integration of Qdrant ANN vector database backend with txtai☆26Updated last year
- A TextTiling-based algorithm for text segmentation (aka topic segmentation) that uses neural sentence encoders, as well as extractive sum…☆49Updated 2 years ago
- Efficient few-shot learning with cross-encoders.☆60Updated last year
- Explore the use of DSPy for extracting features from PDFs 🔎☆45Updated last year
- Open Access PDF harvester, metadata aggregator and full-text ingester☆63Updated last year
- LLM prompt language based on Jinja. Banks provides tools and functions to build prompts text and chat messages from generic blueprints. I…☆116Updated 2 months ago
- Writing Blog Posts with Generative Feedback Loops!☆50Updated last year