R is a great environment for interactive analysis on your desktop, but when your data needs outgrow your personal computer, it's not clear what to do next.
This is material for a short overview of scalable data analysis in R. The slides can be viewed at https://ljdursi.github.io/beyond-single-core-R .
It covers:
- How to think about parallelism and scalability in data analysis
- The standard parallel package, including what was the snow and multicore facilities, using airline data as an example
- The foreach package, using airline data and simple stock data;
- A summary of best practices.
Included in the materials, though not in the talk, are some more advanced methods: