STAT100 Assignment 2: Bike sharing
Total: 50 marks
Weight: 10%
Bike sharing systems are a new generation of traditional bike rentals which are now commonplace in capital cities throughout the world. Through these systems, users are able to rent a bike from any rental station and return it to another station in the city. Users may register for a monthly or annual subscription with the bike sharing system or use bicycles on a casual fee-per-use basis. It is estimated that there are over 500 bike-sharing programs around the world. There is great interest in these systems due to the important role they play for traffic, environmental and health issues.
Data has been collected over the years 2011 and 2012 from a bike sharing system in Washington DC, USA on the daily usage of bicycles. A simple random sample of 100 days from this 2 year period will be used for this assignment. The variables in the Bike dataset are:
Date – day/month/year
· Season – winter, spring, summer and autumn
· Year – 2011 or 2012
· Temperature – maximum daily temperature (in degrees Celcius)
· Ambient_Temperature – the “real feel” maximum daily temperature (in degrees Celcius)
· Humidity – average daily percentage humidity
· Windspeed – average daily windspeed (in km/hr)
· Registered – the numbers of registered subscription users for the day
· Casual – the number of casual fee-per-use users for the day
· Total – the total number of registered and casual users for the day
You have been asked by the bike sharing company to use appropriate statistical methods to answer the following questions:
(1) What is relationship between the number of casual users and the number of registered users? Use a simple linear regression to describe and test whether this relationship is significant. [15 marks]
(2) Choose an appropriate statistical test to answer: Is there a difference in the total number of users between 2011 and 2012? If so, in which year was bike use higher and is this difference significant? [15 marks]
(3) Choose an appropriate statistical test to answer: Does the total number of users vary across the four seasons? Which season/s differ from the other seasons? [15 marks]
Overall presentation [5 marks] – See below for more details.
For EACH of the three questions listed above you need to:
(i) State the appropriate null and alternative hypotheses for the question listed. Provide both the written form and mathematical notation where appropriate.
(ii) Provide a useful and well-presented graphic of the data being examined by this question1. Make sure that this plot relates to the hypotheses that you’ve listed in (i). In addition, provide both a title for the graphic as well as a brief description of the plot in relation to the hypotheses listed in (i).
(iii) Identify which statistical test is the most appropriate to use. Verify and provide adequate support that the conditions are suitable for undertaking the proposed statistical test.
(iv) Calculate and present all relevant test statistics. If you include RStudio output in this section please follow presentation guidelines below.
(v) Briefly discuss and interpret your findings for this question. Be sure to refer to the hypotheses that were under investigation as well as any relevant calculated values from your RStudio output such as test statistics, summary statistics, p-values, degrees of freedom, and, confidence intervals.
· Place your name and student number at the top of the first page of your assignment.
· Only include relevant output from RStudio.
· Give a concise interpretation of your results and an informative conclusion in plain English. Use complete sentences when writing up your results. Spell check and proof-read your work.
· Correct notation is required – use the equation editor in WORD to insert symbols.
· Where algebraic symbols are required, accompany them with an explanation of what they stand for.
· Algebraic expressions for probabilities or equations are set out formally and exactly. Items are linked by ‘=’ only if they are equal.
· Label (e.g., Figure 1) and explain all plots – what information is available from the plots? Figure labels and captions go below each figure.
· Make sure that any tables and R output have a title and that columns are aligned. Table labels and captions go above each table.
1 If appropriate, the graphic produced in (ii) for each question may be referred to when verifying the statistical test conditions in (iii).