Machine Learning Predictions

Purpose

The objective of this blog is to chronicle my journey in the realm of machine learning, with a specific focus on its application in the Real Estate and Demographics sector. This project aims to leverage demographic data and real estate trends to forecast future property values across the United States.

Housing Data

To develop a predictive model for housing prices, I have gathered data from the US Census and Zillow. This data serves as the foundation for my machine learning model, which includes the following features:

Total number of births per year: This can indicate population growth trends, which may affect housing demand.
Average household size: Larger households might prefer larger homes, influencing housing prices.
Median mortgage loan financed per State: Reflects the borrowing capacity and credit health of residents.
Median household income per State: Higher incomes can increase purchasing power, potentially raising housing prices.
Median housing cost per month per State: Offers insight into the affordability of housing in different regions.
Median real estate taxes per State: Tax rates can impact the overall cost of homeownership.
Number of occupied housing units per State: Indicates the supply and demand dynamics in the housing market.
Zillow’s Home Value Index per State: Provides a snapshot of current home values, which is crucial for price prediction.

Explore the data

Deep Learning & Neural Networks

Using TensorFlow and Keras, I apply deep learning to our housing data to make predictions. The following procedures were applied to the model:

Data Preprocessing: Normalizing and transforming data to a format suitable for neural network training.
Model Definition: Using Keras to define the architecture of the neural network with layers, neurons, activation functions, etc.
Model Compilation: Specifying the loss function and optimizer to guide the training process.
Model Training: Feeding the preprocessed data into the model and adjusting the weights through backpropagation.
Model Evaluation: Assessing the model’s performance with validation data and metrics.
Prediction: Using the trained model to predict housing prices based on new input data.

Python code

X = df.drop('Price',axis=1).values
y = df['Price'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=101)

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = keras.Sequential()

model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dropout(0.2))

model.add(layers.Dense(1)) #predict the price

model.compile(optimizer='adam',loss='mse')

early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=25)

model.fit(x=X_train,y=y_train,validation_data=(X_test,y_test), epochs=360, callbacks=[early_stop])

predictions = model.predict(X_test)

By leveraging TensorFlow and Keras, I trained deep learning models that are capable of making accurate predictions for housing prices, which are essential for both potential buyers and sellers in the real estate market.

Evaluate the model

These metrics provide a quantitative measure of how accurately the model’s predictions align with the actual values. Given that we are evaluating data from all states, which accounts for the seasonality of the housing market and the broad generalization of data inputs, the model was able to predict housing prices ranging from $100,000 to over $1,000,000 with errors less than $66,000. This is quite impressive considering the complexity and variability of the housing market. While the model provides a good starting point, further refinement and feature engineering could potentially improve its predictive performance.

Results

Name	Description	Price
Root Mean Squared Error (RMSE)	This value indicates that the typical prediction error made by your model.	$66,474
Mean Absolute Error (MAE)	This value indicates that, on average, the predictions made by your model are about $45,797 away from the actual values.	$45,797
Explained Variance Score	This score measures how well our model can account for the variation in our target variable, housing prices.	85.4%

Advanced Home Value Forecasting