Webhose / free-news-datasetsLinks
Weekly free datasets from global news sites
β23Updated this week
Alternatives and similar repositories for free-news-datasets
Users that are interested in free-news-datasets are comparing it to the libraries listed below
Sorting:
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)β75Updated 9 months ago
- Chrome Extension for exploring Hugging Face datasets πβ50Updated 10 months ago
- Common crawl extractorβ78Updated last year
- Tools to construct and process Common Crawl webgraphsβ92Updated last week
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graphβ25Updated last year
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.β54Updated 5 months ago
- This repository is designed for deploying and managing server processes that handle embeddings using the Infinity Embedding model or Largβ¦β23Updated 5 months ago
- LLM plugin for clustering embeddingsβ80Updated last year
- VerifAI initiative to build open-source easy-to-deploy generative question-answering engine that can reference and verify answers for corβ¦β75Updated 5 months ago
- β24Updated 2 years ago
- Automated Qualitative Analysis of LLMs (ICLR 2025)β41Updated last month
- Automated Document Intelligence Workflowβ25Updated 7 months ago
- GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extractionβ77Updated last year
- Statistics of Common Crawl monthly archives mined from URL index filesβ184Updated 2 weeks ago
- An autonomous Mall assistant that can answer user queries using tools. Powered by LLMs.β14Updated last year
- β40Updated 7 months ago
- Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval anβ¦β30Updated 10 months ago
- A pipeline using LLMs for Knowledge Engineering, combining knowledge probing and Wikidata entity mapping.β37Updated 7 months ago
- Pivotal Token Searchβ119Updated 3 weeks ago
- Python package that adds IntelligentGraph capabilities to RDFLib RDF graph packageβ55Updated last year
- β11Updated 9 months ago
- Reward Model framework for LLM RLHFβ61Updated 2 years ago
- LLM-powered autonomous agent with hierarchical task managementβ50Updated 2 years ago
- Universal text classifier for generative modelsβ24Updated last year
- β67Updated last year
- LLM plugin for models hosted by Anyscale Endpointsβ35Updated last year
- Easiest way to build custom agents, in a no-code notion style editor, using simple macros.β34Updated 9 months ago
- Tutorial and template for a semantic search app powered by the Atlas Embedding Database, Langchain, OpenAI and FastAPIβ115Updated last year
- Flow Chart Image-to-Code Generationβ33Updated 2 years ago
- Various Jupyter notebooks about Common Crawl dataβ55Updated 4 months ago