Dplyr vs Datatable

In the world of data science in R, the battle between dplyr and datatable is real. Here I compare their performance against base r commands for some common tasks. Who will be the winner on speed and simplicity? Make random datasets set.seed(71) size1 <- 4*10^6 size2 <- size1 * 0.1 df1 <- data.frame(id=paste0("SERVICE", 1:size1), value=rnorm(size1), stringsAsFactors=FALSE) df2 <- data.frame(id=paste0("SERVICE", sample(1:size1, size2)), value=rnorm(size2), stringsAsFactors=FALSE) dt1 <- data.table(df1) dt2 <- data.table(df2) # mtcars data M <- data.
Continue reading

Author's picture

Bryan Whiting

father, innovator, data scientist

Data Scientist

Washington, D.C.