Amazon Sagemaker enables data scientists and developers to train and deploy machine learning models for performing the analysis of the dataset.
Amazon Sagemaker is a cloud-based machine learning tool that is used to build, train, test, and deploy machine learning models. Amazon launched this fully managed machine learning service in 2017. During its initial launch, it was limited to some regions but now it has flourished across all the regions of AWS.
Amazon Sagemaker has several key features:
- It provides several built-in ML algorithms to train your datasets.
- Sagemaker also provides pre-trained models that can be deployed as-is
- It automatically scales model inference to multiple server instances.
- Sagemaker also offers managed instances of Tensorflow and Apache MXnet. So, that the developers can create their own ML algorithms from scratch.
Machine learning is an emerging technology and it would be icing on the cake if you already know how to build, train and deploy machine learning models using Sagemaker.
So, we are here to guide you with the step-by-step hands-on exercise on How to build machine learning models using Amazon Sagemaker.
Select Amazon Sagemaker from services. Navigate to Dashboard. Here we will have to create a Notebook Instance.
So, let’s create Notebook first. Click on Notebook instances.
You will be redirected to the Notebook Instances page. Click on create notebook instance.
Here, we will provide a name for our notebook instance. For this hands-on exercise, we are choosing the default Notebook Instance Type i.e. ml.t2.medium.
In Permissions and encryption, we will provide the IAM role. Here, you can create a new IAM role. You can also attach an existing role here.
For creating a new role, click on Create a new role from the dropdown. This IAM role is required by Notebook Instances to access other services like Sagemaker, S3, etc.
Here, you can select Any S3 bucket or you can give the name of any specific S3 bucket that you want. This will allow the sagemaker to access your buckets and their contents.
We are using the existing role SagemakerRole to perform this example. Now Click on Create Notebook Instance.
Our Notebook instance will be in pending state. It will take a minimum of 2-3 mins for your Notebook instance to come into InService state.
After it appears in InService state, you can go to Actions and then select Open Jupyter from drop down.
Once you are into your notebook instance, go to Files then New and select conda_python3 from drop down.
Jupyter Notebook will open in a new window. Here we will write the ML algorithms to build our model from the data set.
Here, we will import all the important python libraries that we will need for this exercise.
Next, we will provide the bucket name where the model and train data will be stored. The prefix will be created inside the bucket.
Also, we will define the IAM role of the sagemaker.
In this step, we will fetch the dataset.csv file from the bucket. Sagemaker will read the file from S3 location.
This will be the output of the file. All the data of the dataset.csv file will be printed. This output will also show you the rows and columns present in the file.
In this file, there are 13932 rows and 7 columns.
Now, this file will get saved in our dataframe. Sagemaker will be able to read it from there using the below algorithm.
The below-shown table will be the output.
Now, we will apply some algorithms to refine this data. We will use matplotlib of python to create a histogram from the given data.
The output will show the individual column and its observation as shown below.
Here, we will drop the value column from the dataset.csv
Here, we will compare individual columns with the year column.
This will be the required output from the above algorithm.
Now, we will plot the data on the graph.
We will drop the below columns from the table. They are not required for the prediction.
Now, we will split the data into two parts and store them into train.csv and validate.csv
In this step, we will upload the data into the S3 bucket using the below boto3 algorithm.
The two folders will be created in the bucket as below and CSV files will be saved with the name train.csv and validate.csv
This is how we will analyze our data set using the machine learning algorithms in Amazon Sagemaker.
Using this, we can easily find out valuable insights from the data and we can also make future predictions accordingly using the above algorithms.
Hope this information is helpful. We will keep sharing more about how to use new AWS services. Stay tuned!
Meanwhile …
Keep Exploring -> Keep Learning -> Keep Mastering