Describe the difference between a sample and a population and explain why is important to understand the difference of these terms? Which of these is used in this task? (1 marks)
Construct a frequency table of the salaries, using a maximum of 10 groups, and enter your values in the table below. (5 marks; 1 per class boundary and 3 for frequency)
Lower Class boundary Upper class boundary Frequency (f)
5000 14999
Calculate the mean, median and mode of the data set. Which measure of central location best describes the data in this task and why? (3 marks)
Use the frequency distribution to draw a rough sketch/ graph and describe the shape of the distribution in appropriate terms – these might include “approximately normal,” “symmetric,” “skewed” (identify in which direction). How would you expect the mean and median values to relate to each other? (2 marks)
Calculate the range, the interquartile range, the variance, and the standard deviation. (2 marks)
Perform calculations in order to provide responses to the questions below:
How many people were working at the company? (1 mark)
What is the average salary at this company? (1 mark)
What percentage of employees that earn less than £55,000? (1 marks)
How many employees earn more than £55,000? (1 mark)
The manager decided to deduct all the salaries by 5%. Calculate measures of central location and dispersion of the new values (mean, median, mode, quartile 1 and 3, Interquartile range, range, variance, standard deviation and standard error including the highest and the lowest salaries (10 marks)
A rival company published a summary of their financial information indicating that the mean salary of their employees is £75,000 with a standard deviation of £40,000. Compare the co-efficient of variation for both companies. (3 marks)
Explain results from all of your calculations above and conclude your analysis. Your answers should include (500 words) (10 marks)
The purpose of the descriptive statistics.
The results of your analysis of the questions above.
The interpretation of your results/methods including any suggestions where appropriate.
Conclusion – a summary of what you have achieved.
Question 2 – Regression Analysis
Weighting – 30% of total marks
The ACE is situated by the harbour in the historic town of Smallville. The ACE is a family run pub providing bed and breakfast accommodation and delicious home-made bar meals. Sarah Essen, marketing manager, needs to investigate the company advertising expenditure and its sales revenue. Sarah believes that the company sales revenue depends on the advertising expenditure as many small companies run advertising to support the selling function. The data of advertising expenditure and weekly sales from last year has been collected and entered on the worksheet entitled ACE.
State, giving a brief explanation, which variable would be the dependent variable and which variable would be independent variable. (2 mark)
Draw a scatter plot of this data on graph paper and comment on the association between ‘Advertising Expenditure’ and ‘Weekly Sales’ as indicated by the scatter plot. (3 marks)
Calculate average of ‘Advertising Expenditure’ and ‘Weekly Sales’, the Correlation coefficient (R) and Coefficient of Determination (R2) for this data. What does the value of your coefficient tell you about the association between the two variables? (6 marks)
State the regression equation for this task. (5 marks)
Using the equation that you achieved from above estimate the sale revenue if
the advertising expenditure were £1,000 (2 marks)
the advertising expenditure were £1,500 (2 marks)
Draw conclusions from the results that you achieved from the questions above. Your answers should include (300 – 500 words): (10 marks)
The purpose of regression analysis.
The results of your analysis for each question.
The interpretation of your results/methods including any suggestions where appropriate. Your interpretation should include the interpretation of correlation between two variables including the strength, direction, and an explanation of the effect.
Conclusion – a summary of what you have achieved.
Question 3 – Time Series and Forecasting (Additive Model)
Weighting – 30% of Total Marks
The manager of the electrical department at Arnold’s store asked a statistician to perform a time series analysis on the quarterly sales figures of the numbers of TVs sold over the past 3 years. The statistician was called away on urgent business and you have been asked to finish the analysis. A copy of the data is shown on ARNOLD worksheet. You have been told that the aim is to forecast the next 4 quarters’ sales.
Complete the calculations that are required to estimate the sale forecast for Quarter 1, 2, 3 and 4 of Year 4 including the values of sales forecast for the quarter 1, 2, 3 and 4 of year 4. The table should include, Moving Average, Trend Centered Moving Average, Seasonal variation, adjusted seasonal values, extended trend and sale forecast. (22 marks)
Conclude the results that you achieved from the questions above. Your answers should include: (100 words) (8 marks)
The purpose of the forecasting.
Conclusion – a summary of what you have achieved.
FORMULAE SHEETS
Descriptive Statistics
Excel Function
ABSOLUTE Returns the absolute value of a number. The absolute value of a number is the number without its sign. ABS(number)
COUNT (n) Counts the number of cells that contain numbers and numbers within the list of arguments. Empty cells, logical values, text, or error values in the array or reference are ignored. COUNT(A1:A100)
FREQUENCY Calculates how often values occur within a range of values, and then returns a vertical array of numbers. FREQUENCY(data_array,bins_array)
MAXIMUM Returns the largest value in a set of values. MAX(A1:A100)
MINIMUM Returns the smallest number in a set of values. MIN(A1:A100)
TOTAL (SUM) Adds all the numbers in a range of cells. SUM(A1:A100)
Ungrouped data
Coding formula Excel Function
MEAN ()
sigma – meaning ‘the sum of’, is the symbol for mean Returns the arithmetic mean of the arguments. AVERAGE(A1:A100)
MODE Returns the most frequently occurring, or repetitive, value in an array or range of data. MODE(A1:A100)
MEDIAN (Q2) Returns the median of the given numbers. MEDIAN(A1:A100) or
QUARTILE.INC(A1:A100,2)
LOWER QUARTILE (Q1) Returns the lower quartile of a data set. QUARTILE (A1:A100,1)
Or
QUARTILE.INC(A1:A100,1)
UPPER QUARTILE (Q3) Returns the upper quartile of a data set. QUARTILE (A1:A100,3)
Or
QUARTILE.INC(A1:A100,3)
RANGE Max – Min Max(A1:A100)-Min(A1:A100)
STANDARD DEVIATION (Population) Calculates standard deviation based on the entire population given as arguments.
STDEV.P(A1:A100)
STANDARD DEVIATION
(Sample) Estimates standard deviation based on a sample. STDEV.S(A1:A100)
VARIANCE (Population) Calculates variance based on the entire population. VAR.P(A1:A100)
VARIANCE (Sample) Calculates variance based on the entire sample VAR.S(A1:A100)
MEAN ABSOLUTE DEVIATION (MAD) Returns the average of the absolute deviations of data points from their mean. AVEDEV(A1:A100)
Regression Analysis
Coding formula Original Data Excel Function
Co-efficient of Correlation PEARSON(array1,array2)
Co-efficient of determination
RSQ(known_y’s,known_x’s)
Intercept
INTERCEPT(known_y’s,known_x’s)
Slope SLOPE(known_y’s,known_x’s)
Time Series Analysis and forecasting
Coding formula Original Data Excel Function
Intercept INTERCEPT(known_y’s,known_x’s)
Slope SLOPE(known_y’s,known_x’s)
Time Series
(Additive Model) Y = T + S + I
T is the Trend
S is the Seasonal
I is Irregular Component None
Forecast Y value
(Additive Model) Forecast Y value = Extended Trend value + Adjusted Seasonal value None