Random forest is a non linear regression algorithm that creates decision trees on data samples and then gets the prediction from each of them and finally selects the best solution by majority vote.
*It is a type of ensemble learning that means combining many models into one and here it combines many decision trees.
To see the decision tree model implementation click here
For this model we will be using random forest to predict a salary of an employee with position level of 6.5
In [6]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
In [9]:
df = pd.DataFrame({'Position': ['Business Analyst', 'Junior Consultant', 'Senior Consultant', 'Manager', 'Country Manager', 'Region Manager', 'Partner', 'Senior Partner', 'c-level', 'CEO'], 'Level':[1,2,3,4,5,6,7,8,9,10], 'Salary': [45000, 50000, 60000, 80000, 110000, 150000, 200000, 300000, 500000, 1000000]})
x = df.iloc[: , 1:2].values
y = df.iloc[:, 2].values
print(df)
In [10]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(x, y)
Out[10]:
In [11]:
y_pred = regressor.predict([[6.5]])
In [14]:
X_grid = np.arange(min(x), max(x), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(x, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Random Forest Regression)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
In [ ]:
Learn About Data Preprocessing : Click Here