Choose one of the following two prompts to respond to. Then in your two follow up posts make two reply. Use the discussion topic as a place to ask questions, speculate about answers, and share insights. Be sure to embed and cite your references for any supporting images.
Option 1:
Think of a problem dealing with two possibly related variables (Y and X) that you may be interested in. Share your problem and discuss why a regression analysis could be appropriate for this problem.
Specifically, what statistical questions are you asking? Why would you want to predict the value of Y? What if you wanted to predict a value of Y that’s beyond the highest value of X (for example if X is time and you want to forecast Y in the future)?
You should describe the data collection process that you are proposing but you do not need to collect any data.
Option 2:
Give an example of a problem dealing with two possibly related variables (Y and X) for which a linear regression model would not be appropriate. For example, the relationship could be curved instead of linear, or there may be no significant correlation at all.
What is the impact of using a linear regression model in this case? What options, other than linear regression, can you see? You do not need to collect any data.
For your response to a classmate (two responses required, one in each option), examine your classmate’s problem to assess the appropriateness and accuracy of using a linear regression model. Discuss the meaning of the statistical question of the estimate and how it affects the predicted values of Y for that analysis.
Reply 1
Let x represent the temperature as recorded by Dallas, TX.
Let y represent the number of hot beverages sold by a local independent coffee shop.
The statistical question I am looking to study would be the relationship between the temperature and the number of hot beverages sold. Specifically, how temperature affects the increase of hot beverages sold (i.e. the slope of the regression line).
For example, if the regression equation was found to be:
y = 4x + 40
I would expect sales of hot beverages to increase by 4 for each additional degree over 40, on average. Obviously, there are limits to this model: I cannot predict a value of y in negatives, and rarely does Texas experience negative degrees. Also, based on this model for values of x greater than 100 I would predict hot beverage sales to drop thus causing a curved line.
To collect data I could random sampled a local coffee shop (independent non chain) and ask them how many beverages they sold in the past week and record the temperatures for each day.