I’ve been looking at the data available on cycling accidents in the Bay Area (of which I am a statistic). This data comes from CHP. I’ve learned the hard way to start a data analysis with a small subset of your data (else you sit around waiting for computations to happen), so I’ve started by looking at cycling accidents in Sonoma County.
I haven’t figured out why some bars in this plot have gaps between them and others don’t. I’ll keep phutzing with the binwidth parameter.
I haven’t done much modeling yet, but I did model fatalities using logistic regression against time of day, day of week, road conditions, lighting conditions, and violation category. So far, the most statistically significant predictor of a fatality is alcohol being involved in the accident. Remember kiddos, don’t drink and drive.