ebook img

Amazon Machine Learning PDF

145 Pages·2017·2.21 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Amazon Machine Learning

Amazon Machine Learning Developer Guide Version Latest Amazon Machine Learning Developer Guide Amazon Machine Learning: Developer Guide Copyright © 2022 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon. Amazon Machine Learning Developer Guide Table of Contents ...................................................................................................................................................... vii What is Amazon Machine Learning? ..................................................................................................... 1 Amazon Machine Learning Key Concepts ....................................................................................... 1 Datasources....................................................................................................................... 1 ML Models........................................................................................................................ 2 Evaluations........................................................................................................................ 3 Batch Predictions............................................................................................................... 4 Real-time Predictions.......................................................................................................... 4 Accessing Amazon Machine Learning ............................................................................................ 4 Regions and Endpoints............................................................................................................... 5 Pricing for Amazon ML ............................................................................................................... 5 Estimating Batch Prediction Cost......................................................................................... 5 Estimating Real-Time Prediction Cost ................................................................................... 7 Machine Learning Concepts................................................................................................................. 8 Solving Business Problems with Amazon Machine Learning ............................................................. 8 When to Use Machine Learning................................................................................................... 9 Building a Machine Learning Application....................................................................................... 9 Formulating the Problem .................................................................................................... 9 Collecting Labeled Data .................................................................................................... 10 Analyzing Your Data ......................................................................................................... 10 Feature Processing............................................................................................................ 11 Splitting the Data into Training and Evaluation Data ............................................................. 12 Training the Model ........................................................................................................... 12 Evaluating Model Accuracy ................................................................................................ 14 Improving Model Accuracy ................................................................................................. 17 Using the Model to Make Predictions .................................................................................. 18 Retraining Models on New Data ......................................................................................... 19 The Amazon Machine Learning Process ....................................................................................... 19 Setting Up Amazon Machine Learning................................................................................................ 21 Sign Up for AWS ...................................................................................................................... 21 Tutorial: Using Amazon ML to Predict Responses to a Marketing Offer .................................................... 22 Prerequisite............................................................................................................................. 22 Steps...................................................................................................................................... 22 Step 1: Prepare Your Data......................................................................................................... 22 Step 2: Create a Training Datasource .......................................................................................... 24 Step 3: Create an ML Model ...................................................................................................... 28 Step 4: Review the ML Model's Predictive Performance and Set a Score Threshold ............................. 29 Step 5: Use the ML Model to Generate Predictions ....................................................................... 31 Step 6: Clean Up...................................................................................................................... 36 Creating and Using Datasources ......................................................................................................... 38 Understanding the Data Format for Amazon ML .......................................................................... 38 Attributes........................................................................................................................ 38 Input File Format Requirements ......................................................................................... 39 Using Multiple Files As Data Input to Amazon ML ................................................................. 39 End-of-Line Characters in CSV Format ................................................................................ 40 Creating a Data Schema for Amazon ML ..................................................................................... 40 Example Schema.............................................................................................................. 40 Using the targetAttributeName Field .................................................................................. 42 Using the rowID Field ....................................................................................................... 42 Using the AttributeType Field ............................................................................................ 43 Providing a Schema to Amazon ML .................................................................................... 44 Splitting Your Data................................................................................................................... 44 Pre-splitting Your Data ..................................................................................................... 45 Sequentially Splitting Your Data ......................................................................................... 45 Version Latest iii Amazon Machine Learning Developer Guide Randomly Splitting Your Data ............................................................................................ 45 Data Insights........................................................................................................................... 46 Descriptive Statistics......................................................................................................... 46 Accessing Data Insights on the Amazon ML console .............................................................. 47 Using Amazon S3 with Amazon ML ............................................................................................ 54 Uploading Your Data to Amazon S3 ................................................................................... 55 Permissions..................................................................................................................... 55 Creating an Amazon ML Datasource from Data in Amazon Redshift ................................................ 56 Required Parameters for the Create Datasource Wizard ......................................................... 56 Creating a Datasource with Amazon Redshift Data (Console) .................................................. 59 Troubleshooting Amazon Redshift Issues ............................................................................. 61 Using Data from an Amazon RDS Database to Create an Amazon ML Datasource .............................. 65 RDS Database Instance Identifier ........................................................................................ 66 MySQL Database Name ..................................................................................................... 66 Database User Credentials ................................................................................................. 66 AWS Data Pipeline Security Information .............................................................................. 66 Amazon RDS Security Information ...................................................................................... 67 MySQL SQL Query ............................................................................................................ 67 S3 Output Location.......................................................................................................... 67 Training ML Models.......................................................................................................................... 68 Types of ML Models ................................................................................................................. 68 Binary Classification Model ................................................................................................ 68 Multiclass Classification Model........................................................................................... 68 Regression Model ............................................................................................................. 69 Training Process....................................................................................................................... 69 Training Parameters.................................................................................................................. 69 Maximum Model Size ........................................................................................................ 70 Maximum Number of Passes over the Data .......................................................................... 70 Shuffle Type for Training Data ........................................................................................... 70 Regularization Type and Amount ........................................................................................ 71 Training Parameters: Types and Default Values .................................................................... 71 Creating an ML Model ............................................................................................................... 72 Prerequisites.................................................................................................................... 73 Creating an ML Model with Default Options ........................................................................ 73 Creating an ML Model with Custom Options ........................................................................ 73 Data Transformations for Machine Learning ......................................................................................... 75 Importance of Feature Transformation........................................................................................ 75 Feature Transformations with Data Recipes................................................................................. 75 Recipe Format Reference........................................................................................................... 76 Groups............................................................................................................................ 76 Assignments.................................................................................................................... 77 Outputs........................................................................................................................... 77 Complete Recipe Example ................................................................................................. 78 Suggested Recipes.................................................................................................................... 79 Data Transformations Reference ................................................................................................. 80 N-gram Transformation..................................................................................................... 80 Orthogonal Sparse Bigram (OSB) Transformation ................................................................. 81 Lowercase Transformation................................................................................................. 81 Remove Punctuation Transformation .................................................................................. 82 Quantile Binning Transformation ........................................................................................ 82 Normalization Transformation............................................................................................ 82 Cartesian Product Transformation ...................................................................................... 83 Data Rearrangement................................................................................................................. 84 DataRearrangement Parameters......................................................................................... 84 Evaluating ML Models....................................................................................................................... 87 ML Model Insights.................................................................................................................... 87 Binary Model Insights............................................................................................................... 88 Version Latest iv Amazon Machine Learning Developer Guide Interpreting the Predictions ............................................................................................... 88 Multiclass Model Insights.......................................................................................................... 91 Interpreting the Predictions................................................................................................ 91 Regression Model Insights......................................................................................................... 92 Interpreting the Predictions................................................................................................ 92 Preventing Overfitting .............................................................................................................. 94 Cross-Validation....................................................................................................................... 95 Adjusting Your Models ...................................................................................................... 96 Evaluation Alerts...................................................................................................................... 96 Generating and Interpreting Predictions .............................................................................................. 98 Creating a Batch Prediction ....................................................................................................... 98 Creating a Batch Prediction (Console) ................................................................................. 98 Creating a Batch Prediction (API) ........................................................................................ 99 Reviewing Batch Prediction Metrics ............................................................................................ 99 Reviewing Batch Prediction Metrics (Console) ..................................................................... 100 Reviewing Batch Prediction Metrics and Details (API) ........................................................... 100 Reading the Batch Prediction Output Files ................................................................................. 100 Locating the Batch Prediction Manifest File ....................................................................... 100 Reading the Manifest File ................................................................................................ 100 Retrieving the Batch Prediction Output Files ...................................................................... 101 Interpreting the Contents of Batch Prediction Files for a Binary Classification ML model ........... 101 Interpreting the Contents of Batch Prediction Files for a Multiclass Classification ML Model ....... 102 Interpreting the Contents of Batch Prediction Files for a Regression ML Model ........................ 103 Requesting Real-time Predictions ............................................................................................. 103 Trying Real-Time Predictions ............................................................................................ 104 Creating a Real-Time Endpoint ......................................................................................... 105 Locating the Real-time Prediction Endpoint (Console) ......................................................... 106 Locating the Real-time Prediction Endpoint (API) ................................................................ 106 Creating a Real-time Prediction Request ............................................................................ 106 Deleting a Real-Time Endpoint ......................................................................................... 108 Managing Amazon ML Objects ......................................................................................................... 109 Listing Objects....................................................................................................................... 109 Listing Objects (Console) ................................................................................................. 109 Listing Objects (API) ....................................................................................................... 110 Retrieving Object Descriptions .................................................................................................. 111 Detailed Descriptions in the Console................................................................................. 111 Detailed Descriptions from the API ................................................................................... 111 Updating Objects................................................................................................................... 111 Deleting Objects..................................................................................................................... 112 Deleting Objects (Console) ............................................................................................... 112 Deleting Objects (API) ..................................................................................................... 112 Monitoring Amazon ML with Amazon CloudWatch Metrics ................................................................... 114 Logging Amazon ML API Calls with AWS CloudTrail ............................................................................ 115 Amazon ML Information in CloudTrail ....................................................................................... 115 Example: Amazon ML Log File Entries ....................................................................................... 116 Tagging Your Objects ...................................................................................................................... 119 Tag Basics.............................................................................................................................. 119 Tag Restrictions...................................................................................................................... 120 Tagging Amazon ML Objects (Console) ...................................................................................... 120 Tagging Amazon ML Objects (API) ............................................................................................ 121 Amazon Machine Learning Reference ................................................................................................ 122 Granting Amazon ML Permissions to Read Your Data from Amazon S3 ......................................... 122 Granting Amazon ML Permissions to Output Predictions to Amazon S3 ........................................ 123 Controlling Access to Amazon ML Resources -with IAM ................................................................ 125 IAM Policy Syntax ........................................................................................................... 125 Specifying IAM Policy Actions for Amazon MLAmazon ML .................................................... 126 Specifying ARNs for Amazon ML Resources in IAM Policies ................................................... 126 Version Latest v Amazon Machine Learning Developer Guide Example Policies for Amazon MLs ..................................................................................... 127 Cross-service confused deputy prevention ................................................................................. 129 Dependency Management of Asynchronous Operations ............................................................... 130 Checking Request Status ......................................................................................................... 131 System Limits........................................................................................................................ 132 Names and IDs for all Objects .................................................................................................. 132 Object Lifetimes..................................................................................................................... 133 Resources...................................................................................................................................... 134 Document History.......................................................................................................................... 135 Version Latest vi Amazon Machine Learning Developer Guide We are no longer updating the Amazon Machine Learning service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. For more information, see What is Amazon Machine Learning. Version Latest vii Amazon Machine Learning Developer Guide Amazon Machine Learning Key Concepts What is Amazon Machine Learning? We are no longer updating the Amazon Machine Learning (Amazon ML) service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. AWS now provides a robust, cloud-based service — Amazon SageMaker — so that developers of all skill levels can use machine learning technology. SageMaker is a fully managed machine learning service that helps you create powerful machine learning models. With SageMaker, data scientists and developers can build and train machine learning models, and then directly deploy them into a production-ready hosted environment. For more information, see the SageMaker documentation. Topics • Amazon Machine Learning Key Concepts (p. 1) • Accessing Amazon Machine Learning (p. 4) • Regions and Endpoints (p. 5) • Pricing for Amazon ML (p. 5) Amazon Machine Learning Key Concepts This section summarizes the following key concepts and describes in greater detail how they are used within Amazon ML: • Datasources (p. 1) contain metadata associated with data inputs to Amazon ML • ML Models (p. 2) generate predictions using the patterns extracted from the input data • Evaluations (p. 3) measure the quality of ML models • Batch Predictions (p. 4) asynchronously generate predictions for multiple input data observations • Real-time Predictions (p. 4) synchronously generate predictions for individual data observations Datasources A datasource is an object that contains metadata about your input data. Amazon ML reads your input data, computes descriptive statistics on its attributes, and stores the statistics—along with a schema and other information—as part of the datasource object. Next, Amazon ML uses the datasource to train and evaluate an ML model and generate batch predictions. Important A datasource does not store a copy of your input data. Instead, it stores a reference to the Amazon S3 location where your input data resides. If you move or change the Amazon S3 file, Amazon ML cannot access or use it to create a ML model, generate evaluations, or generate predictions. The following table defines terms that are related to datasources. Term Definition Attribute A unique, named property within an observation. In tabular-formatted data such as spreadsheets or comma-separated values (CSV) files, the column headings represent the attributes, and the rows contain values for each attribute. Version Latest 1 Amazon Machine Learning Developer Guide ML Models Term Definition Synonyms: variable, variable name, field, column Datasource Name (Optional) Allows you to define a human-readable name for a datasource. These names enable you to find and manage your datasources in the Amazon ML console. Input Data Collective name for all the observations that are referred to by a datasource. Location Location of input data. Currently, Amazon ML can use data that is stored within Amazon S3 buckets, Amazon Redshift databases, or MySQL databases in Amazon Relational Database Service (RDS). Observation A single input data unit. For example, if you are creating an ML model to detect fraudulent transactions, your input data will consist of many observations, each representing an individual transaction. Synonyms: record, example, instance, row Row ID (Optional) A flag that, if specified, identifies an attribute in the input data to be included in the prediction output. This attribute makes it easier to associate which prediction corresponds with which observation. Synonyms: row identifier Schema The information needed to interpret the input data, including attribute names and their assigned data types, and names of special attributes. Statistics Summary statistics for each attribute in the input data. These statistics serve two purposes: The Amazon ML console displays them in graphs to help you understand your data at-a-glance and identify irregularities or errors. Amazon ML uses them during the training process to improve the quality of the resulting ML model. Status Indicates the current state of the datasource, such as In Progress, Completed, or Failed. Target Attribute In the context of training an ML model, the target attribute identifies the name of the attribute in the input data that contains the "correct" answers. Amazon ML uses this to discover patterns in the input data and generate an ML model. In the context of evaluating and generating predictions, the target attribute is the attribute whose value will be predicted by a trained ML model. Synonyms: target ML Models An ML model is a mathematical model that generates predictions by finding patterns in your data. Amazon ML supports three types of ML models: binary classification, multiclass classification and regression. The following table defines terms that are related to ML models. Version Latest 2 Amazon Machine Learning Developer Guide Evaluations Term Definition Regression The goal of training a regression ML model is to predict a numeric value. Multiclass The goal of training a multiclass ML model is to predict values that belong to a limited, pre-defined set of permissible values. Binary The goal of training a binary ML model is to predict values that can only have one of two states, such as true or false. Model Size ML models capture and store patterns. The more patterns a ML model stores, the bigger it will be. ML model size is described in Mbytes. Number of Passes When you train an ML model, you use data from a datasource. It is sometimes beneficial to use each data record in the learning process more than once. The number of times that you let Amazon ML use the same data records is called the number of passes. Regularization Regularization is a machine learning technique that you can use to obtain higher- quality models. Amazon ML offers a default setting that works well for most cases. Evaluations An evaluation measures the quality of your ML model and determines if it is performing well. The following table defines terms that are related to evaluations. Term Definition Model Insights Amazon ML provides you with a metric and a number of insights that you can use to evaluate the predictive performance of your model. AUC Area Under the ROC Curve (AUC) measures the ability of a binary ML model to predict a higher score for positive examples as compared to negative examples. Macro-averaged The macro-averaged F1-score is used to evaluate the predictive performance of F1-score multiclass ML models. RMSE The Root Mean Square Error (RMSE) is a metric used to evaluate the predictive performance of regression ML models. Cut-off ML models work by generating numeric prediction scores. By applying a cut-off value, the system converts these scores into 0 and 1 labels. Accuracy Accuracy measures the percentage of correct predictions. Precision Precision shows the percentage of actual positive instances (as opposed to false positives) among those instances that have been retrieved (those predicted to be positive). In other words, how many selected items are positive? Recall Recall shows the percentage of actual positives among the total number of relevant instances (actual positives). In other words, how many positive items are selected? Version Latest 3

Description:
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.