Webhose / free-news-datasetsLinks
Weekly free datasets from global news sites
β27Updated last week
Alternatives and similar repositories for free-news-datasets
Users that are interested in free-news-datasets are comparing it to the libraries listed below
Sorting:
- Common crawl extractorβ80Updated last year
- Chrome Extension for exploring Hugging Face datasets πβ48Updated last year
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graphβ25Updated last year
- VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for corβ¦β76Updated 7 months ago
- Professional Wargaming LLM Toolboxβ16Updated 2 months ago
- Various Jupyter notebooks about Common Crawl dataβ58Updated 6 months ago
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and minβ¦β25Updated 10 months ago
- Statistics of Common Crawl monthly archives mined from URL index filesβ193Updated this week
- Tools to construct and process Common Crawl webgraphsβ98Updated last week
- Pivotal Token Searchβ126Updated 2 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β75Updated 11 months ago
- Visualize any repo or codebase into diagram or animationβ20Updated 11 months ago
- π Build knowledge bases for RAGβ28Updated 3 months ago
- Email Datasets can be found hereβ71Updated 5 years ago
- The Official NewsCatcher News API V2 SDK for Pythonβ20Updated last year
- create workflows with LLMsβ54Updated last year
- An OpenBB agent slack bot that is ready to answer any financial questionβ12Updated last year
- Official Repo for CRMArena and CRMArena-Proβ118Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.β56Updated 6 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extractionβ80Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.β26Updated last year
- LLM-powered autonomous agent with hierarchical task managementβ52Updated 2 years ago
- π©π€π€ A curated list of datasets for large language models (LLMs), RLHF and related resources (continually updated)β24Updated 2 years ago
- A cog model for the all-mpnet-base-v2 sentence-transformers embedding model.β15Updated last year
- β40Updated 9 months ago
- LLM-based mutation testingβ11Updated 8 months ago
- β11Updated 11 months ago
- Voyage AI Official Python Libraryβ78Updated 3 weeks ago
- Automated Qualitative Analysis of LLMs (ICLR 2025)β47Updated 3 months ago
- Setu is a comprehensive pipeline designed to clean, filter, and deduplicate diverse data sources including Web, PDF, and Speech data. Buiβ¦β15Updated last year