rushitjasani / Wikipedia-Search-Engine
A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
☆17Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for Wikipedia-Search-Engine
- Big Data webapp using Chicago street congestion, crashes, red light violations, and speed camera violations☆38Updated 3 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆18Updated 6 years ago
- This project's aim was to implement various Recommendation Models on Hadoop Framework and to compare their performance.☆24Updated 6 years ago
- Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web …☆16Updated 2 years ago
- 4 different Big Datasets joined to get single table for final data analysis. Fraud Detection by taken consideration of different key feat…☆44Updated 4 years ago
- Multi-class classification model for predicting the types of crimes in Toronto☆13Updated 8 months ago
- Cyber Security for Big Data and IoT using Machine Learning☆14Updated 5 years ago
- Big data projects implemented by Maniram yadav☆50Updated 6 years ago
- Data cleaning, pre-processing, and Analytics on a Health care data using Spark and Python.☆45Updated last year
- Predict your Medical insurance cost!☆77Updated 2 months ago
- Hi Everyone Glad to see your interest in this repo and welcome, we will be working on end to end data science project which is "Loan Pred…☆36Updated last year
- A machine learning web application use to predict chances of heart disease, built with FLASK and deployed on Heroku.☆25Updated 6 months ago
- Customers in the telecom industry can choose from a variety of service providers and actively switch from one to the next. With the help …☆67Updated 2 years ago
- ☆54Updated 10 months ago
- This project is based on Unsupervised Learning☆16Updated 5 years ago
- Data science virtual internship program by British Airways through Forage!☆33Updated last year
- Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.☆94Updated 3 years ago
- A content based movie recommender system using cosine similarity☆156Updated 3 months ago
- Worked on various OTT Dataset. Used the Query Editor for data cleaning and preprocessing. Displayed the visuals using a variety of plots,…☆37Updated last year
- Lending Club Data Loan Default Prediction☆51Updated last year
- helping Digitial Music Store to optimize their business practices using PostgreSQL☆23Updated 4 years ago
- Data Science Capstone Project Using Python and Tableau 10☆47Updated last year
- customer-satisfaction☆130Updated 3 months ago
- This repo contains Data Science code snippet☆81Updated last month
- ☆44Updated 2 years ago
- ☆27Updated last year
- My Graduate Capstone Project - This is a Product Recommendation System for a Local Wholesaler in India, using Python and Machine Learning…☆29Updated 3 years ago
- I am using confluent Kafka cluster to produce and consume scraped data. In this project, I've created a real-time data pipeline that uti…☆28Updated last year
- IBM Data Engineering Courses from Coursera☆68Updated last year
- Medical data extraction from medical documents like prescription and patient details document using python and Regex☆19Updated 2 years ago