simonw / nicar-2025-scraping
Cutting-edge web scraping techniques workshop at NICAR 2025
☆337Updated last month
Alternatives and similar repositories for nicar-2025-scraping:
Users that are interested in nicar-2025-scraping are comparing it to the libraries listed below
- Data from the Bloomberg News analysis on streamers and podcasters on YouTube☆22Updated 3 months ago
- Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.☆586Updated last month
- Examples and guides for using the VLM Run API☆274Updated last month
- Template repository for setting up a new git scraper☆98Updated 2 months ago
- A command-line book tracking tool☆129Updated last week
- LLM plugin providing access to models running on an Ollama server☆282Updated 2 weeks ago
- Free travel times between U.S. Census geographies☆142Updated last month
- Markdown Blog with GitHub Pages. Easy setup!☆151Updated 3 months ago
- An SDK for working with LLMs and AI Agents from Apache Airflow, based on Pydantic AI☆354Updated this week
- A playbook for effectively prompting post-trained LLMs☆861Updated 3 months ago
- CLI tool for stripping tags from HTML☆315Updated last month
- ☆440Updated this week
- This project collects GPU benchmarks from various cloud providers and compares them to fixed per token costs. Use our tool for efficient …☆220Updated 4 months ago
- ai for jq☆240Updated 7 months ago
- WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.☆247Updated 2 months ago
- Fully open-source command-line AI assistant inspired by OpenAI Codex, supporting local language models.☆292Updated this week
- Visualise your CSV files in seconds without sending your data anywhere☆505Updated last month
- Uses an llm to generate ffmpeg commands☆474Updated 3 months ago
- Tools for LIL's data preservation project☆121Updated 2 months ago
- OCR Benchmark☆464Updated last week
- Weave your codebase into a single, navigable Markdown document☆420Updated last month
- Import unstructured data (text and images) into structured tables☆149Updated last week
- Fully neural approach for text chunking☆319Updated last week
- Build, Improve Performance, and Productionize your LLM Application with an Integrated Framework☆339Updated 5 months ago
- Python framework for building efficient data pipelines. It promotes modularity and collaboration, enabling the creation of complex pipeli…☆635Updated last week
- ☆279Updated 4 months ago
- MapMatrix - A React application for synchronized multi-view map comparison. Mostly generated by AI.☆173Updated 4 months ago
- Build a RAG dataset for your domain in just a few lines of codes, using your XML sitemap☆47Updated 8 months ago
- https://verdad.app☆82Updated 3 months ago
- Documentation and code for Hack the MontyHome device for extended applications.☆231Updated 5 months ago