-My data science projects during my free time in the university-
Ds
Project 4: Penguins Species Classification
Followed tutorial by Data Professor and modified the program to make the predictions using a file work normally Built a random forest model for predicting penguins species and saved into a pickle file. Read in the pickle file and predicted penguins species based on user input (slider or file uploading). Provided the example format of the file to be uploaded and showed the respective prediction results with their predicted probabilities based on user input or rows in file uploaded.
Ds
Project 3: IMDB Movie Reviews Sentiment Analysis
Performed word preprocessing such as special characters text and stopwords removal as well as stemming on the review texts. Conducted feature transformation to convert text data into numerical features using TF-IDF. Built a Multinomial Naive Bayes and Logistic Regression machine learning model to predict positive and negative sentiments and achieve F1 score of around 0.85. Plotted two word clouds to see the common words used in positive and negative reviews respectively.
Ds
Project 2: President Trump's Lies Dataset
Scraped all lies of President Trump in 2017 from this website. Used requests library to get HTML source of the website. Used BeautifulSoup library to parse the HTML match the specific HTML tag containing the details of lies. Stored the details of lies (date of lies, the lies, explanation of lies and the url linked to that explanation) in a CSV file. All President Trump's Lies in 2017 (Source: The New York Times) Link to Notebook