When you need an R server and have lots of data to process, AWS is a great way to go. Sign up of the free tier and poke around!
Creating an AWS Rstudio server:
https://aws.amazon.com/blogs/big-data/running-r-on-aws/ – using both the R snippet (works but the R core bits are NOT present and it will not work yet) and the JSON snippet provided
https://www.rstudio.com/products/rstudio/download-server/ – the suite being installed
Follow most of the AWS blog AMI info, with the following items:
AMI: Amazon Linux 2 (more packages and extras v. standard)
- t2.micro (free tier)
- IAM policy follows AWS blog JSON snippet
- Security Policy contains open inbound ports 22, 8787, 3838 (the latter two for R server specific communication)
- Append user, username:password in the blog post’s initial r studio install text (pasted into the “advanced” text box when completing the AMI setup
SSH into the EC2 instance
sudo yum install –y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum-config-manager –enable epel
sudo yum repolist
wget https://download2.rstudio.org/rstudio-server-rhel-1.1.423-x86_64.rpm
sudo yum update -y
sudo yum install -y R
sudo rstudio-server verify-installation
Access the graphical R server:
In a web browser, tack on “:8787” to the end of the Instance’s public “connect” link. If it doesn’t load a login window (but seems to be trying to connect to something) the security policy is probably being overzealous……..
Notes on S3-hosted data:
- S3 data is easiest to use if it is set to be public.
- There are s3-specific tools for R, accessible as packages from CRAN directly from the R interface
- Note data (delimited text at least) hosted in S3 will behave differently than it does locally, e.g. spaces, “na”, “null” need to be “cleaned” in R before use.
There we have it!
-Jess
0 Comments
1 Pingback