How to export >100x faster .csv files from R for big data
TLDR; You should use data.table::fwrite() to export .csv from r, because it is more than >100 times faster than other methods. If you want to export your data as Excel xlsx, you should use openxlsx. But it is not recommended for big data. If you want to learn more about Data Science in R, I recommend this book*.

Why you have to export to .csv?
There is this case, when you want to exchange your data with customers or other people like your colleagues. And for some people a database format like sqlite is not usable, so you have to move to other options like Excel. I know I also do not like Excel, but it still widely used and a common format. So as a data analyst you have to work with that. So I compared recently options for R, to save time and frustration.
Create a fake dataset with fakir
{fakir} is a good library from ThinkR to create some fake data. You should check it out in this link. For the test, I created a fairly big table with 1,000,000 rows and 25 rows. So, it is of course not big data, but already hard to handle with Excel. For the test purpose, it was enough. I compared 4 common libraries and 7 methods, see below. For additional comparison, I also added .xlsx export.
Test Set up:
I have 8 GB RAM PC with intel core i5 @ 2.4 GHz and 4 cores. So nothing special. If you have a better PC, the export should be faster.
## Import librarieslibrary(tidyverse)
# install.packages("fakir",
# repos = c("thinkropen" = "https://thinkr-open.r-universe.dev"))
# install.packages("charlatan")
library(fakir)
library(data.table)
library(openxlsx)
library(xlsx)
library(vroom)## Construct Datasetdata <- fake_ticket_client(vol = 1000000)

Next I created for each method a sys.time() and took the difference between start and end, see code below. I saved all outputs also to compare the file size.
## 1 Benchmark Openxlsxstart.time <- Sys.time()
openxlsx::write.xlsx(data, file = "openxslx_export.xlsx")
end.time <- Sys.time()
time.taken <- end.time - start.timeprint(time.taken)
# Time difference of 6.246952 mins## 2 Benchmark XLSXstart.time <- Sys.time()
xlsx::write.xlsx(data, "xslx_export.xlsx", sheetName = "Sheet1",
col.names = TRUE, row.names = TRUE, append = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.timeprint(time.taken)
# Time difference of 7.059051 mins## 3 Benchmark XLSX2start.time <- Sys.time()
xlsx::write.xlsx2(data, "xslx2_export.xlsx", sheetName = "Sheet1",
col.names = TRUE, row.names = TRUE, append = FALSE)
end.time <- Sys.time()
time.taken <- end.time - start.timeprint(time.taken)
# JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError"## 4 Benchmark CSV Export utilsstart.time <- Sys.time()
utils::write.csv(data, file = "utils_write.csv_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 1.901615 mins## 5 Benchmark CSV 2 Export utilsstart.time <- Sys.time()
utils::write.csv2(data, file = "utils_write.csv2_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 3.071534 mins## 6 Benchmark CSV Datatable fwritestart.time <- Sys.time()
data.table::fwrite(data, file = "fwrite_export.csv")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 1.205034 secs## 7 Benchmark CSV vroomstart.time <- Sys.time()
vroom::vroom_write(data, file = "vroom_write_export.csv", delim = ",")
end.time <- Sys.time()
time.taken <- end.time - start.time
print(time.taken)
# Time difference of 14.00942 secs
The Results
In result, you can see from 7 methods, just 6 worked for me. xlsx::write.xlsx2 did not work and throw some unknow errors. Maybe my memory is to less, but this is also a good result. Below, you see the results, data.table:.fwrite won with a big time ahead for {utils} by more than 100 s. If you compare xlsx exports, they take a lot longer. More than 6 minutes!

In terms of output size openxlsx::write.xlsx is half the size of the csv files.

If you compare both dimensions, you can decide which you want to optimize. But I would go for fwrite or vroom and prefer to save time, since it is 100 x faster but only 2 times bigger. Below you will find the code 👇🏻
What would you prefer? Let me know in the comments!

Thanks for reading my article. I hope you liked it. Please feel free to like, share, and comment on it. Follow me on more content about cryptos, stocks, data analysis and web development.
- Read my other article about stocks and reddit:
- Check out my webpage and get in touch with me on linkedin: antonioblago.com
- Make a free account on my stock and crypto tracking tool: www.tenxassets.com
- If you like to develop web apps in python, I recommend you pythonanywhere.com*
Here you find the code:
If you want to learn more about Data Science in R, I recommend this book*.