icaropires / pdf2dataset
Converts a whole subdirectory with a big (or small) volume of PDF documents to a dataset (pandas DataFrame) with error tracking and choice of features
☆20Updated 3 months ago
Alternatives and similar repositories for pdf2dataset:
Users that are interested in pdf2dataset are comparing it to the libraries listed below
- Data platform for LLMs - Load, index, retrieve and sync any unstructured data☆19Updated 6 months ago
- 📃 A contracts clause summarization system using LLM and vector database☆16Updated 2 months ago
- This repo walks you through how to use transfer learning to fine tune a LLM (large language model) using UK Supreme Court case law as the…☆35Updated last year
- Trained BERT and Word2Vec legal clause classifiers for SPACY using the Atticus Project's Open Source Contract Label Corpus☆14Updated 4 years ago
- Example LangGraph flow that does "competitor analysis" on the web.☆28Updated 10 months ago
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆57Updated last week
- A Model Context Protocol server for Python code analysis with Claude. Again, works with warning now. I'm missing something here.☆13Updated 4 months ago
- A multi-agent business consultant app on streamlit implemented using crewAI☆16Updated 9 months ago
- 📚 Build knowledge bases for RAG☆17Updated last week
- Building a Chain of Thought RAG Model with DSPy, Qdrant and Ollama☆31Updated last year
- 🚀 Template Haystack Search Application with Streamlit☆27Updated 3 months ago
- In this project I designed a knowledge graph focused on Napoleon's history. I built a RAG application using this data and improved the ou…☆42Updated 7 months ago
- GPT-3.5-trubo + Harvard's Case Access Project☆19Updated last year
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆50Updated last month
- Winning Hackathon entry for Streamlit LLM Hackathon October 2023☆15Updated last year
- ☆28Updated last year
- Streamlit RAG ChatBot app to screen PubMed - biomed scientific paper database☆17Updated 2 months ago
- CrewAI AgentOps: Monitor your AI Agents☆17Updated 9 months ago
- A python package that provides a custom streamlit connection to query data from weaviate, the AI native vector database☆54Updated 8 months ago
- Agentic RAG with Langchain, Qdrant and CrewAI☆58Updated 11 months ago
- Design Patterns for Multi Agents Frameworks Like Autogen, Langraph, Taskweaver,Crewai,etc☆45Updated 10 months ago
- An open source code of the GitHub Copilot Workspace☆11Updated 10 months ago
- WhisperAnywhere: Effortless speech-to-text everywhere on your Mac. Use a hotkey to dictate in any app, powered by Whisper AI and Groq API…☆19Updated 5 months ago
- Add voice input capability to Claude.ai using Transformers.js and Groq API☆27Updated 8 months ago
- Modelling Big Five Personality Inventory using Machine Learning algorithms☆22Updated 5 months ago
- 🤖 UI for gpt-all-star: https://github.com/kyaukyuai/gpt-all-star☆25Updated 2 months ago
- A focused web crawler that uses Machine Learning to fetch better relevant results.☆13Updated 6 years ago
- The code for the Sales Dashboard demo☆16Updated 8 months ago
- LangChain Baby AGI integrated as a Web App using Databutton☆16Updated last year
- AI_Powered_Dev_Search_Engine☆12Updated last year