1 AWS EC2 and S3 Services

Amazon Web Services (AWS) is a public cloud computing provider. Unlike enterprise cloud providers, AWS allows anyone with an e-mail address and a credit card access to a large variety of services easily and quickly. Its software infrastructure is based on virtualization, operated to offer its computing capability as a service, and designed for high flexibility and reliability. This cheat sheet describes the family of AWS services and provides guidelines for using them. If you are interested, check out this link for additional online content dealing with AWS. Amazon Web Services continues to evolve rapidly. Look there to learn the latest about AWS.

The most popular services are EC2 and S3. EC2 is the AWS computing service, which offers computing capacity on demand with immediate availability and no set commitment to length of use. S3 is the AWS’s first service. It offers the object storage over the web. We will explain these two services in some detail in the following sections.

1.1 EC2 Services

In earlier days when you needed a server, you had to buy one, and then have the server delivered, installed, and connected to the network. Finally, you gained access to your server. It wasn’t uncommon for this process to take from three to six months. EC2 provides virtual servers in a matter of minutes, all via self-service. It first has a virtualization layer that uses virtual machines to provide a virtual server. It then overlaid its virtualization layer with a sophisticated software layer designed to obviate the need for human intervention in the provisioning process of a virtual machine. With this innovation from Amazon, a fundamental part of the entire IT industry – the use of provisioning servers – has been shifted.

EC2 is based on virtualization — the process of using software to create virtual machines that then carry out all the tasks you’d associate with a “real” computer using a “real” operating system. EC2 has come up with its own terminology: When a virtual machine is running in EC2, it’s referred to as an instance; when an instance isn’t running in EC2, it’s referred to as an image. Likewise, in virtualization, a virtual machine is started, and in EC2 an instance is launched.

An Amazon Machine Image (AMI) is the collection of bits needed to create a running instance. This collection includes the three essential elements: (1) at minimum, the operating system that will run on the instance, (2) any software packages you’ve chosen to install, and (3) any configuration information needed for the instance to operate properly. You choose which AMIs to use for your application based on these elements.

You also need to consider instance types – the types of virtual machines you can run in AWS. Instances vary by the amount of three types of compute resources: (1) processing power in terms of a certain number of EC2 compute units (ECU), (2) amount of memory, measured in gigabytes, and (3) amount of disk storage. Every instance also comes supplied with one virtual network interface card (NIC), which it uses to communicate with other devices or services. Every instance is given two IP addresses: one private address that’s used solely within AWS and one public address that’s used for internet access to the instance. Depending on your application’s operating characteristics, you can choose which instance types to use (compute optimized, memory optimized, or storage optimized, for example).

After you choose an instance type, you also have to choose an image size that is suitable for your application. The variety of image sizes provides a range of computing resources available for an instance type. If you find all these mindboggling, there is an excellent third-party website that lists and compares all the different instance types and sizes.

1.2 S3 Services

The enormous growth of storage makes traditional approaches (local storage, network-attached storage, storage-area networks, and the like) no longer appropriate, for extremely large amount of data, for fast transfer of those data, and for the affordability at such scale. S3 provides highly scalable object storage in the form of unstructured collections of bits. Individual users tend to use S3 as secure, location-independent storage of digital media. Another common personal use for S3 is to back up local files.

S3 objects are treated as web objects — that is, they’re accessed via Internet protocols using a URL identifier, in this format: http://s3.amazonaws.com/bucket/key. A bucket in AWS is a group of objects. A key in AWS is the name of an object, and it acts as an identifier to locate the data associated with the key. This convenient arrangement provides a familiar directory-like or URL-like format for object names; however, it doesn’t represent the actual structure of the S3 storage system.

In the following sections, we provide an example of how to use EC2 and S3 services for model fitting. In this example, we will launch an Amazon Deep Learning Instance on EC2 and run a Multi-Layer Perceptron model that predicts the wireless network intrusion activities based on a customized dataset. We upload and store this dataset in an S3 bucket. We show the procedures that allow an EC2 instance to read the data from and write computational results to the bucket.