STAT1060 – Assignment 2 Semester 2, 2020

STATISTICAL ANALYSIS TO SUPPORT DECISION MAKING

Total marks: 46, Weight: 20%

Due: November 8, 23.59 PM (Sunday of Week 12)

Submission instructions and general marking criteria

• Submit a copy in Word or PDF format via Turnitin.

• Assignments submitted by other means (e.g. email) or forms (scanned copy, Excel document) will attract no marks.

• Late Submission Penalty: As detailed in the Course Outline.

• It is expected that Excel is used to assist with calculations and preparation of appropriate graphs. All relevant Excel output should be included with your assignment. However, raw computer output without explanatory text is not acceptable. Answers must be written in clear English sentences clearly linked to appropriate supporting computer output. Only extract the parts of the Excel output that are relevant toansweringthe question should be includedandoutput not relevant should not be included.

• You will need to demonstrate understanding of types of data, the use of graphs to explore distributions of variables and relationships between variables, and of statistical tests of such relationships. Marks will be awarded based on the quality of your assessment of the data and how clearly that assessment is communicated.

• The assessment requires you to apply concepts from Weeks 5 to 10 to a specific scenario and to apply the correct analysis to the various scenarios/data sets and to write up the results of a statistical analysis.

Question 1 (6 Marks)

Car buyers in the city of Newcastle were asked by a car dealer to rate their level of satisfaction with the service that they received. The four possible ratings were: Excellent (E), Good (G), Satisfactory (S) and Unsatisfactory (U). Data showing the level of satisfaction with the service for December 2019, is provided in the file Question1.xlsx.Column A contains the level of satisfaction scores.

(i) What type of variable is “level of satisfaction” (Continuous, Discrete, Ordinal or Nominal)? Be sure to justify your answer. [2 Marks]

(ii) Name the appropriate graphical display to use to display ‘level of satisfaction’, based on the variable type you identified in part (i). [1 Mark]

(iii) Use Excel to create the appropriate graph to display the provided data. [1 Mark]

(iv) Comment on the key aspects from this graph. [2 Marks]

Question 2 (4 Marks)

Commuting times of students who travel by bus to the University of Newcastle Callaghan campus are known to be normally distributed with a mean of 25 minutes and standard deviation 5 minutes. Use the empirical rule of the normal distribution to answer the following questions. Shade in (roughly) the corresponding area under the curve for each part (i-iv).

(i) What is the probability that the student’s commuting time is less than 20 minutes to reach campus? [1 Mark]

(ii) What is the probability that the student’s commuting time is more than 35 minutes to reach campus? [1 Mark]

(iii) What is the probability that the student’s commuting time is between 15 and 30 minutes to reach campus? [1 Mark]

(iv) Find the bus commuting time that corresponds to the 97.5 % percentile (approximately) of the distribution (i.e. find the commuting time above which only 2.5% of the distribution appears). [1 Mark]

Variance 7.7202 0

Observations 100 2

Hypothesized Mean Difference 0

df 99

t Stat 4.7618

P(T =t) one-tail 0.0000

t Critical one-tail 1.6604

P(T =t) two-tail 0.0001

t Critical two-tail 1.9842

Question 4 (14 Marks)

The University of Newcastle marketing services are interested in mobile usage technology. A study was undertaken in which a random sample of students enrolled at the University of Newcastle in 2019 and 2020 were invited to participate in a project about the daily usage of smartphones. There were 50 randomly selected students in 2019 and 50 randomly selected students in 2020 who participated in the study. Data is provided in the Excel file Question4.xlsx. Columns A contains student smartphone daily usage in 2019 and Column B contains student smartphone daily usage in 2020 (minutes per day).

(i) Use Excel to construct a histogram of studentsmartphone daily usage in 2019. How would you describe the distribution, including the shape? Include a histogram with your response.

[2 Marks]

(ii) Use Excel to construct a histogram of studentsmartphone daily usage in 2020. How would you describe the shape? Include a histogram with your response. [2 Marks]

(iii) Use Excel to find the mean, median, standard deviation and interquartile ranges of student smartphone daily usage in 2019. Repeat for the smartphone daily usage in 2020. [2 Marks]

(iv) Using the information from parts (i)-(iii), give a brief statement comparing smartphone daily

usage in 2019 with smartphone daily usage in 2020. [2 marks]

(v) Explain, using support from a hypothesis test, if the average smartphone daily usage in 2019 differs from the average smartphone daily usage in 2020. Perform the hypothesis test at the 5% significance level. Assume that data in 2019 and 2020 were taken from different students.

Be sure to include the following in your answer:

? the null and alternative hypotheses

? the p-value

? conclusion

(Hint): Research Question: Was the average smartphone usage the same in 2019 and 2020?

[6 Marks]

Question 5 (6 Marks)

A JB Hi-Fi franchise in Newcastle set a discount pricing strategy with the aim to increase the sales of smartphones. The difference* (with promotion and without promotion) in sales of new smartphone is defined as:

Difference* = Sales on days with promotion – Sales on matched days without promotion

Test the hypothesis of whether there is a difference, at the 5% significance level, between the mean difference in sales of new smartphones (between days with promotion and days without promotion). Comment on the result. Use Figure 2 to answer the research question. Note: Both sets of days’ measurements were taken from the same JB

Hi-Fi franchise.

Figure 2: t-Test: Two-Sample Assuming Unequal Variances

Difference Dummy

Mean 14.4873 0

Variance 102.1922 0

Observations 50 2

Hypothesized Mean Difference 0

df 49

t Stat 10.1336

P(T =t) one-tail 0.0000

t Critical one-tail 1.6766

P(T =t) two-tail 0.0001

t Critical two-tail 2.0096

Question 6 (10 Marks)

A sample of 450 students was selected from the University of Newcastle to determine if there is a relationship between smoking status and student’s diet (Vegetarian and non-Vegetarian). Use Figure 3 to answer the following probability questions (i) – (iv).

Figure 3: Observed Frequencies

Vegetarian Non-vegetarian Total

Smoker 18 25 43

Non-smoker 97 310

115 335

(i) What is the probability that a randomly selected student is vegetarian? [1 Mark]

(ii) What is the probability of a randomly selected student being “Vegetarian and a Non-smoker”? [1 Mark]

(iii) What is the probability of a student being “Vegetarian or a Smoker”? [1 Mark]

(iv) What is the probability that a randomly selected student is a Non-smoker given that the student is

Non-vegetarian? [1 Mark]

Figure 4: Multiple bar chart for Smoking status and Diet of students

Figure 5: Expected Frequencies

Vegetarian Non-vegetarian Total

Smoker 10.99 32.01 43

Non-smoker 104.01 ***

115 335

(v) In Figure 5, calculate the expected count for the empty cell corresponding to the Non-vegetarians who are non-smokers. Note that this cell corresponds to the observed count of 310 in Figure 3. [1 Mark]

(vi) Conduct an appropriate hypothesis test at the 5% significance level to determine if there is a statistically significant relationship between Smoking status and student’s Diet. Be sure to report at least the P-value and use this to answer the research question. The p-value for the test statistic is given as 0.010.

[5 Marks]