Ds

Project 7: Bank Customer Churn Prediction

Performed Synthetic Minority Oversampling Technique (SMOTE) to overcome the problem of imbalance class in the customer dataset by synthesizing new examples from the existing examples. Built a XGBoost model and achieved over 93% AUC score in predicting churn of the customer. Found out the most important feature that impacted customer churn was the total transaction count for the past 12 months. AutoEDA of the Customer Dataset using Pandas Profiling Link to Google Colaboratory Notebook with Explanation

Ds

Project 5: Fake News Detection

Performed word preprocessing such as noisy text and stopwords removal. Conducted feature transformation to convert text data into numerical features using Tokenizer before converting the generated tokens into sequences and applying padding so that each input sequence became in the same length. Used Sequential API in Keras including the Embedding and LSTM Layer. Utilized GloVe Embeddings to build the embedding matrix for the Embedding Layer. Achieved nearly 1.00 in F1 score in classifying real or fake news.

Ds

Project 4: Penguins Species Classification

Followed tutorial by Data Professor and modified the program to make the predictions using a file work normally Built a random forest model for predicting penguins species and saved into a pickle file. Read in the pickle file and predicted penguins species based on user input (slider or file uploading). Provided the example format of the file to be uploaded and showed the respective prediction results with their predicted probabilities based on user input or rows in file uploaded.

Ds

Project 3: IMDB Movie Reviews Sentiment Analysis

Performed word preprocessing such as special characters text and stopwords removal as well as stemming on the review texts. Conducted feature transformation to convert text data into numerical features using TF-IDF. Built a Multinomial Naive Bayes and Logistic Regression machine learning model to predict positive and negative sentiments and achieve F1 score of around 0.85. Plotted two word clouds to see the common words used in positive and negative reviews respectively.