Coding a Random Forest
The Data
For the Random Forest example, we will reuse the winequality_red
data set.
Coding Random Forest: General Steps
- Load the random forest packages
- Read in the data
- Identify the target feature
- Divide the data into a training set and a test set. a. Choose the sample size b. Randomly select rows c. Separate the data
- Fit the random forest model
- Apply the model to the test data
- Display the feature importance
1. Load Random Forest Package
from sklearn.ensemble import RandomForestClassifier
2, 3, 4.
# Repeat steps 2-4 from the Decision Tree example
import pandas as pd
. . .
train_data, test_data, train_target, test_target = model_selection.train_test_split(wine_data, wine_target, test_size=test_size, random_state=seed)
5. Fit the Random Forest Model
model = RandomForestClassifier()
model.fit(train_data, train_target)
6. Apply the Model to the Test Data
forest_results = model.predict(test_data)
7. Compute Feature Importance
importances = model.feature_importances_
8. List Feature Importance
import numpy as np
indices = np.argsort(importances)[::-1]
print("Feature ranking:")
col_names = list(train_data.columns.values)
for f in range(len(indices)):
feature = col_names[indices[f]]
space = ' '*(20 - len(feature))
print("%d.\t %s %s (%f)" % \
(f + 1, feature, space, importances[indices[f]]))
Activity: Random Forest Program
Make sure that you can run the Random Forest code: 02_Random_Forest.ipynb