Americas

  • United States

Asia

Sharon Machlis
Executive Editor, Data & Analytics

R in 5 lines or less: Data breach quick analysis

how-to
Mar 26, 20152 mins
AnalyticsBusiness IntelligenceR Language

It would take a lot more than 5 lines of code to do serious analysis of complex security data; in fact, there’s an entire book written on the topic of Data-driven Security. Here, though, we’re just looking at a list of announced security breaches in 2014 compiled by Privacy Rights Clearinghouse, focusing in on types of breaches and victims. Are there any interesting stories to tell from these two categories?

The video below shows some easy ways of viewing item counts and generating a few visualizations from the data. If you’d like to try this yourself, the 5 lines of code are after the video. Breach data can be downloaded from the Privacy Rights Clearinghouse website. I did a bit of manual cleaning of that data and only kept columns I called DatePublic, Organization, Entity, Type, City, State, InfoSource and Year. (Unfortunately, data about numbers of people affected by each breach was not in a useful format for analysis).

If you don’t already have GGally on your system, remember to download and install it first with install.packages("GGally").


breaches = read.csv("breaches_2014.csv", header = TRUE, stringsAsFactors = FALSE)
table(breaches$Type)
library(GGally)
ggfluctuation2(table(breaches$Entity, breaches$Type))
barplot(table(breaches$Entity, breaches$Type), legend.text = TRUE, col = terrain.colors(7),
        main = "2014 breaches, via Privacy Clearinghouse")

New to R? Check out my Beginner’s Guide to R free PDF download.

See more examples of R in 5 lines or less.