You should use Rapidminer to complete Part A. Remember that the ID column is used specifically for identification and should not be included as a predictor variable in your machine learning prediction model.
Assume you are a manager at a local outpatient clinic.
You would like to offer patients at risk for obesity and hypercholesterolemia early interventions to help improve their health. Use the data in Cholesterol.xlsx file and use machine learning to identify four groups of similar patients, so you can offer tailored interventions to each group. In the gender column, 0 denotes Female, whereas a 1 denotes Male.
1a. What is the average cholesterol for the group with the highest overall cholesterol?
1b. Does this group have more males or females?
2. You have many patients who have still not signed up to use the patient portal, which allows patients to manage several health care related tasks online, such as medication refills, scheduling appointments and messaging their physician.
To increase patient portal adoption, you would like to send a targeted mail campaign to the group of current non-users who are more likely to sign up for the patient portal. Pick an appropriate machine learning program and use the Training.xlsx file to train your model and predict portal adoption for patients listed in Scoring.xlsx
2a. What machine learning algorithm would be appropriate: linear regression or logistic regression?
2b. What is the predicted patient portal adoption status for Patient ID 993? If targeted in the marketing campaign, is she likely to become an adopter or stay non-adopter?