Assignment 4: Online News Sharing @ Mashable
The dataset D5.2 (described in C5.2) contains information of online news articles published by Mashable (www.mashable.com). Some of these articles contains videos. Our primary question in this assignment is whether including at least one video in an article leads to the article being shared more in social media. Accordingly, the key outcome is “shares”—the number of social media shares for each article. The treatment indicator will be a variable that equals 1 if the number of videos included in an article (num_videos) is non-zero, and equals 0 otherwise.
- Task 1:
- [2 points] Based on linear regression results, is the treatment associated with a typically larger or lower number of shares? For simplicity, base your answer on a regression of the outcome on the treatment indicator (i.e., do not include other covariates).
- Task 2:
- [2 points] Evaluate the propensity score overlap between treated and non-treated subsamples.
- [3 points] Create a matched sample based on logistic propensity scores and in a way that accounts for overlap considerations.
- [2 points] Assess the matched sample in terms of covariate balancing. In your judgement, has the matching procedure been successful?
- Task 3:
- [2 points] Based on your analysis above, provide a matching ATE estimate. Do videos increase the number of shares? By how much? For simplicity, base your answer on a regression of the outcome on the treatment indicator (i.e., do not include other covariates).
- [2 points] Provide a rationale that explains the disparity between the estimate of 3.a and 1.a (i.e., your rationale must describe some form of behavior for why one estimate is larger than the other).
- [2 points] Suppose that the unconfoundedness assumption holds: what could then be
the “fudge factor” (discussed in class) in this case? Explain.