Amazon SageMaker is an amazing one-stop shop for all your machine learning needs. From plenty of open-source, real-life datasets to world-class infrastructure of CPUs, GPUs, and every TensorFlow library you may possibly need, everything can be found under one roof. I have recently started using it and already fascinated by how easy it is to build, train, and deploy your ML models. In this article, I’ll try to cover some of the basics of SageMaker and give a tiny tutorial for you to kick things off.
SageMaker is just one of Amazon’s plenty of web services and its very purpose is to provide everything required by data scientists at one place without worrying about the intensive machine learning infrastructure requirements. It is available as a Platform service, which sits between Frameworks & Hardware (CPU, GPU, TensorFlow, Keras, PyTorch, etc) and Application Services (Polly, Transcribe, Translate, Comprehend, Rekognition Image, Rekognition Video, etc). Few more examples of such Platform services include AWS DeepLens, Amazon Machine Learning, Spark & EMR, Mechanical Turk, etc. Some of SageMaker’s key components include:
- Hyperparameter tuning
Any AI use case has 3 lifecycle stages:
SageMaker very cleverly compartmentalise these stages without giving the developer even the slightest hint that their code and data is being crunched on three or more entirely separate machines – each ideal for its job.
Build stage normally involves writing code for real-life use cases, downloading big datasets (mostly from Amazon S3 buckets), unzipping the downloaded file, extracting the required fields and cleanup for later training. These steps can be performed on a smaller and cheaper compute instance (in my case ml.t2.medium @ $0.0464 per hour in N. Virginia region).
Training is the most resource-hungry stage of an AI use case. Therefore it is done on large and expensive instances (in my case 2 x ml.c4.2xlarge @ $0.56 per hour in N.Virginia region). The best part is that you don’t explicitly need to create the instance as you would in the case of EC2 or other services. While you are in the comfort zone of your Jupyter Notebooks, you only write one line of code for it to tilizede, use, and destroy the large instance(s). Behind the scenes, the instance is brought up, used for the length of training and dies afterwards. Furthermore, once training is finished, you can save your trained model in a S3 bucket, again – with just one line of code. However, that’s not mandatory as trained models can be saved at any place of your choice.
Model deployment is the last part and as easy as building and training with AWS SageMaker. Depending on the size of model, you may use a slightly smaller and cheaper ML instance for deployment than you used for training. Once the model is deployed, it can be tilized for inferencing immediately.
One of the really cool things about SageMaker’s Dashboard is that all your recent activity is logged, down to the Python logs from Jupyter Notebooks. This means that even after destroying the notebooks instance, you have access to your notebook logs.
How do I start with Amazon SageMaker?
- Go to AWS console
- Click “Amazon SageMaker”
- Click “Notebook Instances”
- Click “Create Notebook Instance”
- Give your Notebook a name such as NLP-HelloWorld
- Select instance type from the dropdown menu. t2.medium @ $0.0464 per hour in N. Virginia region is the cheapest at the time of writing this article
- Create an IAM Role and give access to “Any S3 bucket”. You will need this to allow SageMaker to perform actions in AWS on your behalf (such as creating, updating, and deleting S3 buckets for your ML models)
- Leave everything as is
- Click “Create Notebook instance”
- Once notebook is up and running (watch for status to change from Pending to InService), click “Open Notebook”
- This is the area you are most familiar with. Feel free to bring in your code or use one of several SageMaker example notebooks
- It is highly recommended to save your trained models for later inference to AWS S3 buckets. Instantiating a S3 bucket and saving to it is one-line code each
- Inference from saved models highly depends on the nature of model and hence, it’s out of the scope of this article. In most of the cases, you will be using the same Notebook to read your model from S3 bucket.
Notice: ml.t2.medium instance runs from start till end. It’s just the training and deployment jobs that instantiate requested compute instance(s) and then kill it (automatically) once the job is finished.
Fun Fact: Training vectors for 17 million words (text8) dataset, also called as blazingtext-text8 example in SageMaker using Word2Vec costed me only 10 cents! This is what I used:
Model building: ml.t2.medium instance
Model training: 2 x ml.c4.2xlarge instances (150 seconds)
Model hosting: ml.m4.xlarge instance
Happy Machine Learning!