How to export >100x faster .csv files from R for big data

Image by author

Why you have to export to .csv?

Create a fake dataset with fakir

Test Set up:

## Import librarieslibrary(tidyverse)
# install.packages("fakir",
# repos = c("thinkropen" = "https://thinkr-open.r-universe.dev"))
# install.packages("charlatan")
library(fakir)
library(data.table)
library(openxlsx)
library(xlsx)
library(vroom)
## Construct Datasetdata <- fake_ticket_client(vol = 1000000)
Test Dataset, Image by author
## 1 Benchmark Openxlsxstart.time <- Sys.time()
openxlsx::write.xlsx(data, file = "openxslx_export.xlsx")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 6.246952 mins
## 2 Benchmark XLSXstart.time <- Sys.time()
xlsx::write.xlsx(data, "xslx_export.xlsx", sheetName = "Sheet1",
col.names = TRUE, row.names = TRUE, append = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 7.059051 mins
## 3 Benchmark XLSX2start.time <- Sys.time()
xlsx::write.xlsx2(data, "xslx2_export.xlsx", sheetName = "Sheet1",
col.names = TRUE, row.names = TRUE, append = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError"
## 4 Benchmark CSV Export utilsstart.time <- Sys.time()
utils::write.csv(data, file = "utils_write.csv_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 1.901615 mins
## 5 Benchmark CSV 2 Export utilsstart.time <- Sys.time()
utils::write.csv2(data, file = "utils_write.csv2_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 3.071534 mins
## 6 Benchmark CSV Datatable fwritestart.time <- Sys.time()
data.table::fwrite(data, file = "fwrite_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 1.205034 secs
## 7 Benchmark CSV vroomstart.time <- Sys.time()
vroom::vroom_write(data, file = "vroom_write_export.csv", delim = ",")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 14.00942 secs

The Results

Image by author
Image by author
Image by author
  • Read my other article about stocks and reddit:

--

--

--

I am a data analyst discovering the unlimited world of coding and data.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

New Social Proof Platform: Sosyop

Sosyop, social proof service for websites

Coronathon India -Demo Day 3 projects & Updates

Leetcode — Climbing Stairs

Why I choose Apple eco-system

Leetcode —

Release 1.4: Advanced Analytics

The man who ran the government agency Biomedical Advanced Research and Development Authority

Documenting Ruby on Rails APIs Using rswag Gem

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Antonio Blago

Antonio Blago

I am a data analyst discovering the unlimited world of coding and data.

More from Medium

Interactive Capacity Management Dashboard — R Shiny

Styling Charts in Seaborn

Data Visualization in a loop using Seaborn and Matplotlib

Importance of cleaned Data & Types of Data Errors