Kaggle: Predictions of Titanic Survivors

CASE

“What sorts of people were more likely to survive?” during the Titanic tregedy.

  • Kaggle notebook: Here

WHY

This is an individual project from Kaggle that I worked on to practice my data science skills.

I built a predictive model to predict which passenger was more likely to survive during the sinking of Titanic.

The whole code & work can be found from Kaggle notebook : Here

WORKFLOW

  • Collection of data from Kaggle Titanic dataset consisting of Train & Test data
  • Exploratory data analysis
  • Cleaning & Wrangling data
  • Model & Prediction
  • Evaluation

EXPLORATORY DATA ANALYSIS

CLEANING DATA

  • correcting by dropping
  • completing by dealing with missing value
  • converting it to numeric 
  • creating new feature
data_train['Age'].fillna(value=data_train['Age'].mean(), inplace=True)
data_test['Age'].fillna(value=data_test['Age'].mean(), inplace=True)

MODEL & PREDICTION

# Random Forest Classifier 

RF = RandomForestClassifier() 
RF.fit(X_train, y_train) 

y_pred = RF.predict(X_test) 

acc_RF = round(RF.score(X_train, y_train) * 100, 2) 
acc_RF 
# Feature importance 

importances = pd.Series(data=RF.feature_importances_, index=X_train.columns) 
importances_sorted = importances.sort_values() 

importances_sorted.plot(kind='barh', color='lightblue')
plt.title('Features Importances')
plt.show() 

EVALUATION

Based on scores above, we can sort the scores of all the models.

While both Decision Tree and Random Forest have the same scores (86.87), we decide to use Random Forest as they correct for Decision tree’s habiot of overfitting to the training set.

The whole code & work can be found from Kaggle notebook : Here