Webhose / free-news-datasetsLinks
Weekly free datasets from global news sites
☆30Updated 2 weeks ago
Alternatives and similar repositories for free-news-datasets
Users that are interested in free-news-datasets are comparing it to the libraries listed below
Sorting:
- Common crawl extractor☆80Updated last year
- Automated Document Intelligence Workflow☆28Updated 10 months ago
- A pipeline using LLMs for Knowledge Engineering, combining knowledge probing and Wikidata entity mapping.☆37Updated 10 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆52Updated 7 months ago
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆25Updated last year
- Chrome Extension for exploring Hugging Face datasets 🔎☆49Updated last year
- The Official NewsCatcher News API V2 SDK for Python☆20Updated last year
- Tools to construct and process Common Crawl webgraphs☆101Updated this week
- Newsfeed based on GDELT Project☆30Updated last year
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆75Updated last year
- Simulate human behavior with mass LLMs☆26Updated last year
- Your hub for neuro-symbolic AI: Explore links, papers, and articles with a focus on AI cognition. Contribute and stay updated.☆22Updated last year
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Larg…☆24Updated 8 months ago
- Entity resolution, also known as Data Matching or Record linkage is the task of finding a data set that refer to the same or similar real…☆31Updated 7 months ago
- Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Bui…☆15Updated last year
- ☆28Updated last year
- Statistics of Common Crawl monthly archives mined from URL index files☆195Updated this week
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction☆80Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆57Updated 7 months ago
- A curated list of my GitHub stars!☆38Updated 2 weeks ago
- ☆12Updated this week
- 👩🤝🤖 A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)☆24Updated 2 years ago
- Visualize any repo or codebase into diagram or animation☆20Updated last year
- A place where I experiment with AI and share with a world☆24Updated last year
- Mahabharata text compiled from multiple sources, split into chunks, parsed into CSV files with metadata. Named entities recognised and in…☆36Updated last year
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.☆35Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆25Updated 11 months ago
- Scripts to load the GDELT data set into MongoDB☆14Updated 2 years ago
- Query language for blending SQL and LLMs across structured + unstructured data, with type constraints.☆115Updated 3 weeks ago
- ☆20Updated last week