My Portfolio

A showcase of my projects and abilities

Project 1
Data Cleaning & Exploratory Analysis

The datasets are from Pymaceuticals Inc., a burgeoning pharmaceutical company that specializes in anti-cancer pharmaceuticals. In its most recent efforts, it began screening for potential treatments for squamous cell carcinoma (SCC), a commonly occurring form of skin cancer. In this study, 250 mice identified with SCC tumor growth were treated through a variety of drug regimens. Over the course of 45 days, tumor development was observed and measured. The purpose of this study was to compare the performance of Pymaceuticals Inc.'s drug of interest, Capomulin, versus nine (9) other treatment regimens.
A summary of action items executed are as follows:

  • Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in, garbage-out. A common practice during this task is to correct, impute, or remove erroneous values.
  • Construct data: Derive new attributes (if any) that will be helpful.
  • Integrate data: Create new data sets by combining data from multiple sources.
  • Explore data: Dig deeper into the data. Query it, visualize it using figures (histograms, bar plots, pie charts, boxplots, scatter plots), and identify relationships among the data using tables.
  • Summary statistics: Mean, median, variance, standard deviation, maximum, minimum, quartiles, interquartile range.
  • Correlation and regression.

Project 2
Supervised Machine Learning

The cost of hospital readmission accounts for a large portion of hospital inpatient services spending. Diabetes is not only one of the top ten leading causes of death in the world, but also the most expensive chronic disease in the United State. Hospitalized patients with diabetes are at higher risk of readmission than those without diabetes. Therefore, reducing readmission rates for diabetic patients has a great potential to reduce medical cost significantly. The objective of this study is to predict the likelihood of a diabetic patient being readmitted. The original dataset was obtained from the Center for Machine Learning and Intelligent Systems at University of California, Irvine. It was collected from 130 hospitals in the U.S. during a period of 10 years.
A summary of action items executed are as follows:

  • Data preparation and exploration: clean, construct, format data as necessary (e.g. convert string values to numeric values so that you can perform mathematical operations), and explore data
  • Modeling: select modeling techniques (determine which algorithms e.g. regression, random forest), generate test design (split data into training and test set), build model, and assess model (generally, multiple models are competing against each other, and the data scientist needs to interpret the model results based on domain knowledge, the pre-defined success criteria, and the test design.
  • Correct for imbalance in the data using the concept of undersampling and repeat the modeling process.

Project 3
Unsupervised Machine Learning

A prominent investment bank is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. They need a report that includes what cryptocurrencies are on the trading market and to determine whether they can be grouped to create a classification system for this new investment.A raw data has been obtained to be processed and fit to the machine learning models. Since there is no known classification system, this will require unsupervised learning. Several clustering algorithms will be used to explore whether the cryptocurrencies can be grouped together with other similar cryptocurrencies. Data visualization will be used to share findings with the investment bank.

Project 4
Python Challenge - Bank

Create a Python script that analyzes the financial records of a company. The Python script analyzes the records to calculate each of the following:

  • The total number of months included in the dataset.
  • The net total amount of "Profit/Losses" over the entire period.
  • The average of the changes in "Profit/Losses" over the entire period.
  • The greatest increase in profits (date and amount) over the entire period.
  • The greatest decrease in losses (date and amount) over the entire period.

Project 5
Python Challenge - Election Polls

Create a Python script that analyzes an election data and calculates each of the following: The winner of the election based on popular vote.

  • The total number of votes cast.
  • A complete list of candidates who received votes.
  • The percentage of votes each candidate won.
  • The total number of votes each candidate won.

Social