Coding a Random Forest

The Data

For the Random Forest example, we will reuse the winequality_red data set.

Coding Random Forest: General Steps

Load the random forest packages
Read in the data
Identify the target feature
Divide the data into a training set and a test set. a. Choose the sample size b. Randomly select rows c. Separate the data
Fit the random forest model
Apply the model to the test data
Display the feature importance

1. Load Random Forest Package

from sklearn.ensemble import RandomForestClassifier

2, 3, 4.

# Repeat steps 2-4 from the Decision Tree example
import pandas as pd
. . .
train_data, test_data, train_target, test_target = model_selection.train_test_split(wine_data, wine_target, test_size=test_size, random_state=seed)

5. Fit the Random Forest Model

model = RandomForestClassifier()
model.fit(train_data, train_target)

6. Apply the Model to the Test Data

forest_results = model.predict(test_data)

7. Compute Feature Importance

importances = model.feature_importances_

8. List Feature Importance

import numpy as np
indices = np.argsort(importances)[::-1]
print("Feature ranking:")
col_names = list(train_data.columns.values)
for f in range(len(indices)):
feature = col_names[indices[f]]
space = ' '*(20 - len(feature))
print("%d.\t %s %s (%f)" % \
(f + 1, feature, space, importances[indices[f]]))

Activity: Random Forest Program

Make sure that you can run the Random Forest code: 02_Random_Forest.ipynb

Last updated on Jun 9, 2022