Data Analytics
Indicate whether the following statements are true or false:
- A sample size should not exceed 100 observations, otherwise it will be called a
- True
- False
- The difference between the midpoints of two consecutive classes is equal to the number of classes.
- True
- False
- The line segments in a cumulative frequency polygon can be either increasing or decreasing depending on the given data.
- True
- False
- The variance is considered the most accurate measure of dispersion for distribution comparison because it is calculated using the squared values.
- True
- False
- In a group of 70 scores, if the largest score is increased by 20 points the mean of the scores will increase by 3.5 points.
- True
- False
Problem 2 (15%)
Choose the best answer:
- Which of the following represents a sample?
- Number of cups of coffee served at Starbucks Marbella
- Total registered voters in Spain
- All the Colombians working abroad
- None of the above
- Fifty mouses were chosen from a shelter containing 500 animals to test a new What is the sample?
- The 50 selected mouses
- The 500 animals in the shelter
- The 550 animals
- All the mouses in the shelter
- Which of the following is a discrete variable?
- Depth of the pool measured in meters
- Numbers of newborn kittens
- Number of hours spent on social media
- None of the above
- The amount of “dollars” stuck in non-US banks is a:
- Quantitative discrete variable
- Qualitative discrete variable
- Quantitative continuous variable
- Qualitative continuous variable
- Identify the scale of measurement for the following categorization of clothing: hat, shirt, shoes, pants.
- Nominal level of data
- Ordinal level of data
- Ratio level of data
- Interval level of data
- As part of a test preparation course, students are asked to take a practice version of the Graduate Record Examination (GRE). This is a standardized test, and scores can range from 200 to 800. The appropriate scale of measurement is:
- Nominal
- Ordinal
- Interval
- Ratio
- Children in elementary school are evaluated and classified as non-readers (0), beginning readers (1), grade level readers (2), or advanced readers (3). The classification is done to place them in reading groups.
- Ratio
- Nominal
- Interval
- Ordinal
Problem 3 (25%)
A sample of 20 women were asked about the symptoms they felt after taking the COVID19 vaccine. Below are their responses:
Headaches | Stroke | Fever | Nausea | Tiredness | Nausea |
Headaches | Tiredness | Cough | Fever | Tiredness | Cough |
Skin Rash | Tiredness | Cough | Fever | Nausea | Tiredness |
Cough | Headaches |
- The “Symptoms” is a variable, thus it should be organized into a
.
- Qualitative, frequency distribution
- Qualitative, frequency table
- Quantitative, frequency distribution
- Quantitative, frequency table
- Based on the above data, the relative frequency of “tiredness” is:
- 4
- 5
- 2
- 25
- If two more women were added to the survey and if they both had a stroke after taking the vaccine, the relative frequency of this symptom would be:
- 1
- 15
- 136
- 09
- Based on the above data, the angle that corresponds to the “Fever” category is:
- 15
- 54
- 8
- 58
- The best graphical presentation for this data is:
- Bar Graph
- Histogram
- Frequency polygon
- Cumulative histogram or cumulative frequency polygon
Problem 4 (25%)
The raw data below represents the rate per hour of a sample of doctors in Paris. This data needs to be represented in a frequency distribution.
113 189 186 174 103 125 41 81 47 156 37 89
90 141 126 28 58 172 75 61
- What interval for each class do you suggest?
- 5
- 30
- 33
- 32
- The relative frequency of doctors who earn between 160 USD and 193 USD per hour is:
- 2
- 20%
- 1
- 25
- The percentage of doctors who earn less than 127 USD per hour is:
- 10%
- 20%
- 70%
- 80%
- The percentage of workers who earn more than 160 USD per hour is:
- 80%
- 20%
- 10%
- 16
- The first point of a cumulative frequency polygon that represents this data is:
- X = 61 and Y = 5
- X = 28 and Y = 5
- X = 28 and Y = 0
- X = 5 and Y = 0
Problem 5 (30%)
The numbers that follow represent the number of paint gallons (in thousands) produced each month by a sample of 10 companies.
7 20 10 4 18 12 7 14 6 22
- The mean number of paint gallons is:
- 7
- 12
- 120
- 33
- The mode of this distribution is:
- 15
- 2
- 7
- There is no
- The median of this distribution is:
- 10
- 11
- 12
- 15
- The distribution of data for the number of paint gallons produced is:
- Positively
- Negatively
- Symmetrical
- Cannot be
- The range is:
- 26
- 18
- 15
- 29
- The variance of this distribution is:
- 8
- 98
- 78
- 31
- The standard deviation of this distribution is:
- 8
- 98
- 78
- 31
- Which of the dispersion measures is considered the most accurate for distribution comparison?
- The range because it is the simplest
- The standard deviation because it includes all
- The variance because it is calculated using the squared
- All measures are equally