Fake News Detection - Feature Extraction using Vectorization such as Count Vectorizer, TFIDF Vectorizer, Hash Vectorizer,. Then used an Ensemble model to classify whether the news is fake or not.
☆20Feb 21, 2020Updated 6 years ago
Alternatives and similar repositories for Big_Data_Project
Users that are interested in Big_Data_Project are comparing it to the libraries listed below
Sorting:
- Cloudera_Material: Study Material to help people preparing for Cloudera CCA Spark and Hadoop Developer Exam (CCA175). Feel free to collab…☆41Apr 21, 2020Updated 5 years ago
- SqlDeep is a collection of Microsoft SQL Server database administration scripts originally developed by SqlDeep team.☆12Dec 3, 2025Updated 3 months ago
- Contains SentryOne published Advisory Conditions as well as S1-team submitted conditions for the SentryOne platform.☆14Nov 22, 2023Updated 2 years ago
- Python3 实现的文章余弦相似度计算☆10Sep 28, 2017Updated 8 years ago
- A collection of data analysis projects done using PySpark via Jupyter notebooks.☆10Oct 8, 2022Updated 3 years ago
- Latent Drichlet Allocation and Dynamic Topic Modeling☆10Aug 11, 2021Updated 4 years ago
- Supplementary code for "News Frame Analysis: An Inductive Mixed-method Computational Approach" http://dx.doi.org/10.1080/19312458.2019.16…☆15Nov 13, 2020Updated 5 years ago
- Dynamic Topic Modelling Tutorial Files☆13May 12, 2015Updated 10 years ago
- 实现功能:新输入一段文本,与已有数据进行相似度进行比较,返回TOP10的文本 。主要实现方法:jieba中文分词、gensim、TF-IDF词汇重要性、cosine余弦相似度。☆11Jul 30, 2020Updated 5 years ago
- Utilities to synchronize server-level objects (currently just logins) across availability groups.☆14Jan 28, 2021Updated 5 years ago
- Drop-in replacement for SQL Server's sp_help procedure.☆14Nov 28, 2025Updated 3 months ago
- Demo for the calculation of the Semantic Brand Score (Basic Version)☆13Sep 1, 2020Updated 5 years ago
- Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake developme…☆1,826Aug 26, 2022Updated 3 years ago
- public transport departures☆14Apr 3, 2017Updated 8 years ago
- Example of writing a backtesting framework from scratch☆15Apr 8, 2021Updated 4 years ago
- ☆20Aug 17, 2019Updated 6 years ago
- Streamlit, but better.☆16Feb 5, 2024Updated 2 years ago
- Create many types of interfaces from functions☆18May 10, 2023Updated 2 years ago
- 将word2vec训练生成的词向量和BERT生成的词向量进行可视化对比☆15Jun 29, 2020Updated 5 years ago
- 使用开源的Bert-as-Service预训练生成文档特征向量,基于k-means对COVID-19文献聚类,t-SNE可视化数据,通过LDA为每个簇生成主题关键词,画Bokeh图实现按簇、关键词搜索和筛选数据。☆19Aug 3, 2020Updated 5 years ago
- Hopefully an up to date fork of SQL Power Doc. Newer PS versions and .NET levels. Maybe too ambitious. This repository was cloned from ke…☆23Sep 30, 2023Updated 2 years ago
- 知乎回答、专栏及评论数据全覆盖爬取☆17Mar 11, 2023Updated 2 years ago
- Demoing how to use Matrix and Each definitions in Azure DevOps YAML pipelines.☆19Nov 16, 2023Updated 2 years ago
- WordBias: Visualizing Intersectional Social biases encoded in Word Embeddings☆23Aug 18, 2025Updated 6 months ago
- A lightweight benchmark utility for PySpark☆20Jan 25, 2020Updated 6 years ago
- TXT文本语料数据清洗(Text corpus data cleaning):1> 合并TXT文件;2> 过滤干扰字符串;3> 对人名、地名、组织机构进行遮码处理;4> 将其他编码格式统一转换为UTF-8☆19Oct 14, 2022Updated 3 years ago
- Build a lab environment for testing out dbatools☆26Nov 5, 2022Updated 3 years ago
- Learning from multiple companies in Silicon Valley. Netflix, Facebook, Google, Startups☆18Sep 17, 2018Updated 7 years ago
- This package consists of functionalities for dynamic topic modelling and its visualization☆26May 16, 2020Updated 5 years ago
- Python library to backtest trading strategies, plot charts (via Chartesians), seamlessly download market data, analyse market patterns et…☆27Mar 10, 2025Updated 11 months ago
- 轻量级知乎爬虫,支持问题、收藏夹和本月最热☆24Dec 19, 2018Updated 7 years ago
- A collection of scripts for gathering metrics from SQL Server's underlying DMO's.☆35Apr 8, 2021Updated 4 years ago
- uses Lightly and Label Studio to performa a complete AL workflow from dataset scraping to usage☆38Oct 22, 2025Updated 4 months ago
- Hands-on tutorial on adversarial examples 😈. With Streamlit app ❤️.☆31Jun 17, 2022Updated 3 years ago
- A lightweight ORM for MongoDB using Pydantic models.☆32Dec 31, 2025Updated 2 months ago
- Scripts and stored procedures for Microsoft SQL Server database administrators☆48Updated this week
- A real-time event pipeline around Kafka Ecosystem for Chicago Transit Authority.☆32Aug 14, 2023Updated 2 years ago
- Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow☆163Jun 16, 2020Updated 5 years ago
- 基于TF-IDF和余弦定理计算文本相似度☆36Aug 29, 2018Updated 7 years ago