Webhose / free-news-datasetsLinks
Weekly free datasets from global news sites
☆32Updated this week
Alternatives and similar repositories for free-news-datasets
Users that are interested in free-news-datasets are comparing it to the libraries listed below
Sorting:
- Automated Document Intelligence Workflow☆30Updated 3 weeks ago
- Tools to construct and process Common Crawl webgraphs☆102Updated last week
- Common crawl extractor☆83Updated last year
- A pipeline using LLMs for Knowledge Engineering, combining knowledge probing and Wikidata entity mapping.☆37Updated 11 months ago
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆25Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- The Official NewsCatcher News API V2 SDK for Python☆20Updated last year
- ☆55Updated last year
- Blazing fast fuzzy text search for Python.☆50Updated 7 months ago
- Powerful topic model visualization in Python☆136Updated 8 months ago
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated last year
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆23Updated 5 years ago
- A basic tool that extracts the structure from the PDF files of scientific articles.☆76Updated 3 years ago
- Small python package to measure OCR quality and other related metrics.☆25Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆57Updated 9 months ago
- Tools for interactive visual exploration of semantic embeddings.☆39Updated last year
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆82Updated last year
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆24Updated 2 years ago
- Text Anonymization app with Streamlit and Spacy☆25Updated 4 years ago
- Newsfeed based on GDELT Project☆30Updated last year
- LLM query engine to retrieve augmented responses from json files.☆16Updated 2 years ago
- Email Datasets can be found here☆73Updated 5 years ago
- A public repo that contains integrations for Argilla and LlamaIndex.☆17Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 8 months ago
- Ricgraph - Research in context graph☆29Updated last week
- 🦦 weasel: A small and easy workflow system☆88Updated 3 weeks ago
- Pytorch implementation of a BiLSTM model for the Wikification project.☆19Updated 5 years ago
- Complex data extraction and orchestration framework designed for processing unstructured documents. It integrates AI-powered document pip…☆77Updated this week
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated 5 months ago