Data Analytics
BDA 201–Introduction to Business Analytics – Online
For this project, we partnered with Capital Health to provide an opportunity for Rider University students to work on real-life analytics problems using a real-life dataset. The data collected from Capital Health for their Hospital on the Deborah campus can be downloaded from Canvas Files. For each patient admitted during the period Mar 2021 – Feb 2022, this dataset includes:
- Account number,
- Admission and discharge dates,
- Gender (Female or Male),
- Race,
- Zip code, and
- Diagnosis code (ICD-10).
The Hospital would like to research some questions to manage the available resources to treat their patients and reach out to high-risk groups for preventative screening procedures. In particular, the manager would like to research which are the two most frequent diagnoses so they can arrange the appropriate resources. The manager also wants to investigate whether there is any connection of the two most common diagnosis with the gender, race, or residence (i.e., zip code) of the patients, so they can allocate more medical staff
and/or open new facilities to accommodate patients.
To respond to the manager,answer the following questions using the appropriate plots or tables for this data set. In addition, justify your answer by analyzing the outcome of the plot or table that you have used for the analysis.
- Create frequency and relative frequency distribution of the admissions with respect to gender.
- Create frequency and relative frequency distribution of the admissions with respect to race. (See following page for race coding)
- Create frequency and relative frequency distribution of the admissions with respect to the day of the week. Which day of the week is the day with most of the admissions?
- Create frequency and relative frequency distribution of the admissions with respect to the diagnosis, only for the diagnoses with frequency at least 50 admissions. Which are the top two common diagnoses (most frequent) for the Hospital in Deborah campus?
- Create frequency and relative frequency distribution of the admissions with respect to the residence ZIP code, only for the zip codes with frequency at least 50 admissions. What is the ZIP code that has the most admissions?
- Is there any relationship between the top two popular diagnoses with zip code, race, or gender?
To answer this question, you need to Create contingency tables for the frequency and relative frequency, then plot the bar charts:
- For top two popular diagnoses with the gender.
- For top two popular diagnoses with the top zip codes (at least 50 admissions)
- For top two popular diagnoses with the races.
- Based on your findings in the above questions, please answer how the manager can improve the resources that can be available for the treatment of the patients.