simonw / nicar-2025-scrapingLinks
Cutting-edge web scraping techniques workshop at NICAR 2025
☆367Updated 9 months ago
Alternatives and similar repositories for nicar-2025-scraping
Users that are interested in nicar-2025-scraping are comparing it to the libraries listed below
Sorting:
- Template repository for setting up a new git scraper☆121Updated last month
- Tools for LIL's data preservation project☆125Updated 2 months ago
- Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.☆606Updated 9 months ago
- Data from the Bloomberg News analysis on streamers and podcasters on YouTube☆25Updated 10 months ago
- Mapping the French Culinary Universe☆50Updated 9 months ago
- CleverBee - The Open Source Deep Researcher Tool☆308Updated 6 months ago
- CLI tool for stripping tags from HTML☆348Updated 9 months ago
- Free travel times between U.S. Census geographies☆162Updated 8 months ago
- Examples and guides for using the VLM Run API☆300Updated last week
- https://verdad.app☆83Updated last week
- An SDK for working with LLMs and AI Agents from Apache Airflow, based on Pydantic AI☆506Updated 2 months ago
- AI Dataset Generator – Create realistic datasets for demos, learning, and dashboards☆737Updated 2 months ago
- Import unstructured data (text and images) into structured tables☆162Updated last week
- WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.☆266Updated 10 months ago
- Spegel - Reflect the web through AI☆329Updated 4 months ago
- Tools to build your own "taskmaster"☆161Updated 3 months ago
- Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework☆341Updated last year
- Multimodal RAG to search and interact locally with technical documents of any kind☆279Updated last month
- Vibe coded Tower Defense type of game made for a game jam☆365Updated 5 months ago
- Turn docstrings into LLM-functions☆513Updated last month
- OpenAI's Structured Outputs with Logprobs☆199Updated 6 months ago
- LLM plugin providing access to models running on an Ollama server☆343Updated last month
- Parallel thinking for LLMs. Confidence‑gated, strategy‑driven, offline‑friendly☆274Updated 2 months ago
- A Twitter, Mastodon, and BlueSky bot that shares new interactive, graphic, and data vis stories from newsrooms around the world☆58Updated this week
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆222Updated 11 months ago
- clean & curate your data with LLMs.☆490Updated last year
- Python Script for Structuring data from SEC Form D filings using DuckDB and Python with a display layer using Evidence☆28Updated last year
- Research projects☆217Updated this week
- Count and truncate text based on tokens☆379Updated last year
- Transcribe PDFs with local LLMs☆734Updated 2 months ago