In Week 4 Portfolio Milestone, you’ve examined housing.training.csv dataset. Now , examine housing.testing.csv (Links to an external site.) dataset and perform the same tasks as given in Week 4 Portfolio Milestone.
Using R, calculate the summary statistics (minimum, maximum, mean, median, and standard deviation) and create a histogram of sale price for each dataset. Comparing with housing.training,csv dataset, describe the similarities and/or differences.
Combine the two datasets housing.training.csv and housing.testing.csv. This can be done in R by using the function combine(). Create a histogram of sale prices for the combined dataset and compare it with the histograms from training and testing datasets. Describe the similarities and differences.
Using only the dataset housing.training.csv, fit a linear regression model using all the explanatory variables and SalePrice as the response variable.
What are the significant factors? How do these variables relate to the sale price? Interpret your estimated model.
Remove all the rows with missing values (NA) from the dataset housing.testing.csv. The function complete.cases() can be used. Using only the first 20 rows from housing.testing.csv, predict the sale price. The R function predict() can perform this task. You should have 20 predicted sale prices.
Compare the predicted sale prices to the actual sale prices from the housing.testing.csv dataset (the first 20 rows). How good is your prediction?