A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
☆19Oct 16, 2019Updated 6 years ago
Alternatives and similar repositories for Wikipedia-Search-Engine
Users that are interested in Wikipedia-Search-Engine are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web …☆19Feb 21, 2022Updated 4 years ago
- Multi-class classification model for predicting the types of crimes in Toronto☆14Mar 4, 2024Updated 2 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆22Jan 28, 2018Updated 8 years ago
- Big Data webapp using Chicago street congestion, crashes, red light violations, and speed camera violations☆44Jan 9, 2021Updated 5 years ago
- Project - Data Processing and Analysis in Python Course☆39Oct 10, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A Deep Learning project to detect and recognise Devanagari(Hindi) text from Handwritten Text Documents.☆11Jul 19, 2019Updated 6 years ago
- A BigQuery adapter for Harlequin, a SQL IDE for the terminal.☆10Jan 19, 2025Updated last year
- Python Essentials for AWS Cloud Developers, published by Packt.☆10Apr 27, 2023Updated 2 years ago
- Aadhar Data Extraction is a computer vision-based tool that uses the YOLO model for object detection and the EasyOCR library for optical …☆11Apr 17, 2023Updated 2 years ago
- A web app built with Streamlit that allows you to chat with your databases using a GPT model.☆15Aug 13, 2023Updated 2 years ago
- Ideas, discussions and prototypes for social feed algorithms that give the users what they are interested in but also sparks new ideas.☆17Jan 6, 2023Updated 3 years ago
- Fonts for writing Indian Classical Music☆21May 14, 2025Updated 10 months ago
- Streamlit Dashboard over Superstore Data stored in Postgres Docker container. With SQLAlchemy + Plotly Express☆13Oct 16, 2024Updated last year
- This is a guided certification project, as a part of Data Science for Social Good initiative☆18Mar 9, 2020Updated 6 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This repository contains code to build an MVP search engine with google like interface.☆17Jan 13, 2026Updated 2 months ago
- Case Studies and Projects in Machine Learning/EDA/DL☆24Jun 18, 2024Updated last year
- weighted category-balanced dataset builder for LLM fine-tuning☆16Feb 21, 2026Updated last month
- This Online Voting System is a web-based application built using the MERN stack, which includes MongoDB, Express.js, React.js, and Node.j…☆21Aug 2, 2024Updated last year
- my zsh configuration☆13Jun 26, 2025Updated 9 months ago
- This repository contain Data Analysis on Black Friday Sales Data using various Regression ML algorithms☆20Apr 8, 2025Updated 11 months ago
- ☆17Feb 11, 2022Updated 4 years ago
- Tutorial for creating a simple storage contract using Ethers.☆21Aug 28, 2022Updated 3 years ago
- A simple Dash and Plotly dashboard to review and compare federal economic data☆13Feb 1, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Implementation of a system capable of encryption and decryption of multimedia data (Text, Images, Videos, Audio etc.) using a hybrid mode…☆22Feb 7, 2024Updated 2 years ago
- ☆35Sep 14, 2025Updated 6 months ago
- ☆15Jul 31, 2022Updated 3 years ago
- Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in rea…☆25May 23, 2014Updated 11 years ago
- A program written in C++ that emulates a bogus CPU☆22May 6, 2024Updated last year
- This dataset contain information of hotel booking, We have performed exploratory data analysis in python to get insight from the data.☆13Apr 12, 2020Updated 5 years ago
- Tesla/Nasdaq USD Prediction with Artificial Intelligence RNN Neural Network☆13Apr 11, 2022Updated 3 years ago
- Complete PySpark Guide for the beginners... I prepared this notebook for my students.☆19Sep 18, 2019Updated 6 years ago
- MLgenerator is a web app which help you to generate machine learning starter code with ease.☆34Feb 5, 2021Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Snowflake Data Engineering in Action☆39Oct 18, 2024Updated last year
- This repo is for the Linkedin Learning course: Advanced AI: Transformers for Computer Vision☆38Jun 21, 2025Updated 9 months ago
- ☆24May 21, 2024Updated last year
- This is a Messenger App, made with react, styled with the help of material UI, and deployed with the help of firebase. 💭🖥️☆18Apr 10, 2022Updated 3 years ago
- A repo to help you get your Llamafile up and running quickly☆56Nov 5, 2024Updated last year
- MachineHack is an online platform for Machine Learning competitions. We host toughest business problems that can now find solutions in Ma…☆19Oct 25, 2023Updated 2 years ago
- This repository contains an Apache Flink application for real-time sales analytics built using Docker Compose to orchestrate the necessar…☆49Dec 4, 2023Updated 2 years ago