AndyTheFactory/newspaper4k

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/AndyTheFactory/newspaper4k)

AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.

☆1,127

Alternatives and similar repositories for newspaper4k

Users that are interested in newspaper4k are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

johnbumgarner / newspaper3_usage_overview
View on GitHub
This repository provides usage examples for the Python module Newspaper3k.
☆152Jan 2, 2024Updated 2 years ago
codelucas / newspaper
View on GitHub
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
☆15,121Updated this week
adbar / trafilatura
View on GitHub
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XM…
☆6,334Updated this week
fhamborg / news-please
View on GitHub
news-please - an integrated web crawler and information extractor for news that just works
☆2,472Apr 14, 2026Updated 3 months ago
ranahaani / GNews
View on GitHub
A Happy and lightweight Python Package that Provides an API to search for articles on Google News and returns a JSON response.
☆986Jun 25, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SSujitX / google-news-url-decoder
View on GitHub
A Python script to decode Google News article URLs.
☆294Apr 26, 2025Updated last year
salmon-donate / salmon-donate
View on GitHub
An open-source, self-hosted, and non-custodial solution for receiving cryptocurrency donations.
☆36Jul 1, 2025Updated last year
ScrapeGraphAI / Scrapegraph-ai
View on GitHub
Python scraper based on AI
☆28,610Updated this week
johnbumgarner / newshound
View on GitHub
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…
☆34Mar 14, 2023Updated 3 years ago
upnorthmedia / websiteGPT
View on GitHub
Converts all website content into a text file for uploading to a custom GPT
☆36Jan 18, 2025Updated last year
scrapinghub / article-extraction-benchmark
View on GitHub
Article extraction benchmark: dataset and evaluation scripts
☆376May 29, 2026Updated last month
flairNLP / fundus
View on GitHub
A very simple news crawler with a funny name
☆468Updated this week
buriy / python-readability
View on GitHub
fast python port of arc90's readability tool, updated to match latest readability.js!
☆2,894Jan 26, 2026Updated 5 months ago
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆74,835Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
goose3 / goose3
View on GitHub
A Python 3 compatible version of goose http://goose3.readthedocs.io/en/latest/index.html
☆912Updated this week
apify / crawlee-python
View on GitHub
Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Dow…
☆9,353Updated this week
BerriAI / litellm
View on GitHub
The fastest, litest AI Gateway. Rust core with Python SDK. Call 100+ LLM APIs in OpenAI (or native) format with cost tracking, guardrails…
☆54,606Updated this week
daijro / browserforge
View on GitHub
🎭 Intelligent browser header & fingerprint generator
☆1,180Feb 26, 2026Updated 4 months ago
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,750Updated this week
autoscrape-labs / pydoll
View on GitHub
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
☆6,970Jul 17, 2026Updated last week
hamodywe / telegram-scraper-TeleGraphite
View on GitHub
A fast and reliable Telegram channel scraper that fetches posts and exports them to JSON.
☆277Apr 15, 2025Updated last year
dylannalex / doc2image
View on GitHub
Turn any document into ready-to-use AI image prompts.
☆54Sep 3, 2025Updated 10 months ago
jina-ai / reader
View on GitHub
Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/
☆11,725May 22, 2026Updated 2 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆41,409Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,813Updated this week
mwalkerr / BookmarkCanvas
View on GitHub
IntelliJ Plugin that offers an infinite canvas to organize code bookmarks
☆18May 31, 2025Updated last year
easypro-tech / brs-xss
View on GitHub
MIT license BRS-XSS is a modular Python CLI scanner for XSS vulnerabilities. Features context-aware payloads, WAF evasion, DOM analysis v…
☆33Jan 12, 2026Updated 6 months ago
567-labs / instructor
View on GitHub
structured outputs for llms
☆13,610Jul 13, 2026Updated last week
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆37,796Jul 2, 2026Updated 3 weeks ago
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,346Updated this week
mozilla / readability
View on GitHub
A standalone version of the readability lib
☆11,356Jul 9, 2026Updated 2 weeks ago
pb-kh / brofile
View on GitHub
Brofile is a utility app which grants you with a better link handling abilities (works on my machine)
☆46Jun 4, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
vladkens / twscrape
View on GitHub
Python library and CLI for X/Twitter scraping with multi-account rotation and built-in rate-limit handling.
☆2,614Updated this week
ai-belov / source-to-llm
View on GitHub
A CLI tool that bundles source code files into a single context for LLM prompts
☆21Jan 9, 2025Updated last year
daijro / hrequests
View on GitHub
🚀 Web scraping for humans
☆1,015Dec 1, 2024Updated last year
namuan / py-mcp-manager
View on GitHub
Simple MCP Manager Desktop Application
☆45Aug 26, 2025Updated 10 months ago
tansihmittal / textbehindvideo
View on GitHub
Text Behind Video. Enjoy it is completely free.
☆31Feb 15, 2025Updated last year
zyndai / hector-rag
View on GitHub
Hector RAG is a modular RAG framework built on PostgreSQL, offering advanced retrieval methods and fusion techniques for AI-driven applic…
☆60Feb 24, 2025Updated last year
crewAIInc / crewAI
View on GitHub
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work t…
☆56,080Updated this week