rushitjasani / Wikipedia-Search-Engine
A complete search engine experience built on top of 75 GB Wikipedia corpus with subsecond latency for searches. Results contain wiki pages ordered by TF/IDF relevance based on given search word/s. From an optimized code to the K-Way mergesort algorithm, this project addresses latency, indexing, and big data challenges.
☆17Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for Wikipedia-Search-Engine
- Big Data webapp using Chicago street congestion, crashes, red light violations, and speed camera violations☆38Updated 3 years ago
- Cyber Security for Big Data and IoT using Machine Learning☆14Updated 5 years ago
- Playground for pyspark (RDDs, DStreams) and Apache Airflow. Based on the example of parsing (including incorrectly formated strings) web …☆16Updated 2 years ago
- Data warehouse implementation for an e-commerce website “Infibeam” that sells digital and consumer electronics.☆18Updated 6 years ago
- This project's aim was to implement various Recommendation Models on Hadoop Framework and to compare their performance.☆24Updated 6 years ago
- 4 different Big Datasets joined to get single table for final data analysis. Fraud Detection by taken consideration of different key feat…☆44Updated 4 years ago
- ☆54Updated 10 months ago
- Big data projects implemented by Maniram yadav