ebook img

On Ensuring that Intelligent Machines Are Well-Behaved PDF

72 Pages·2017·5.69 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On Ensuring that Intelligent Machines Are Well-Behaved

On Ensuring that Intelligent Machines Are Well-Behaved 7 1 0 Philip S. Thomas,1 Bruno Castro da Silva,2 2 Andrew G. Barto,1 and Emma Brunskill3 g u A 1University of Massachusetts Amherst 2Universidade Federal do Rio Grande do Sul 7 3Stanford University 1 ] I A Machine learning algorithms are everywhere, ranging from . s simple data analysis and pattern recognition tools used across c thesciencestocomplexsystemsthatachievesuper-humanper- [ formanceonvarioustasks. Ensuringthattheyarewell-behaved— 1 thattheydonot,forexample,causeharmtohumansoractin v a racist or sexist way—is therefore not a hypothetical problem 8 to be dealt with in the future, but a pressing one that we ad- 4 dresshere. Weproposeanewframeworkfordesigningmachine 4 learningalgorithmsthatsimplifiestheproblemofspecifyingand 5 regulating undesirable behaviors. To show the viability of this 0 new framework, we use it to create new machine learning algo- . rithms that preclude the sexist and harmful behaviors exhibited 8 0 bystandardmachinelearningalgorithmsinourexperiments. Our 7 frameworkfordesigningmachinelearningalgorithmssimplifies 1 thesafeandresponsibleapplicationofmachinelearning. : v i 1 Introduction X r a Machine learning algorithms are impacting our lives—they are used across the sciences, for example, by geologists to predict landslides [Jibson, 2007] and biologists working to create a vaccine for HIV [Bhasin and Raghava, 2004], they influence criminal sentencing [Angwin et al., 2016], they control autonomous vehicles [Pomerleau, 1989], andthey are transforming healthcare delivery [Saria, 2014]. The potential for machine learning algorithms to cause harm—including catastrophic harm—is therefore a pressing concern [Bostrom, 2014]. Despite the importance of this problem, current machine learning algorithms do not provide their users with an effective means for precluding undesirable behavior, which makes the safe and responsible use of machine learning algorithms a difficult and error prone process. We introduce a new framework for designing 1 machine learning algorithms that allow their users to easily define and regulate undesirable behaviors. This framework does not address the problem of imbuing intelligent machines with a notion of morality or human-like values [Russell, 2016], nor the problem of avoiding undesirable behaviors that the user never considered [Amodei et al., 2016]. Rather, it provides a remedy for the problem of machine learning algorithms that misbehave because their users did not have an effective way to specify and constrain undesirable behaviors. Thefirststepofthecurrentstandardapproachfordesigningmachinelearning algorithms, which we refer to as the standard approach hereafter, is to define mathematicallywhatthealgorithmshoulddo. Atanabstractlevel,thisdefinition is the same across all branches of machine learning: find a solution, θ?, within a feasible set, Θ, that maximizes an objective function, f :Θ R. That is, the → goal of the algorithm is to find a solution in argmaxf(θ). (1) θ Θ ∈ Importantly, the algorithm does not know f(θ) for any θ Θ (e.g., the true ∈ mean squared error)—it can only reason about it from data (e.g., by using the sample mean squared error). Oneproblemwiththestandardapproachisthattheuserofamachinelearning algorithmmustencodeconstraintsonthealgorithm’sbehaviorinthefeasibleset or objective function. Encoding constraints in the objective function (e.g., using soft constraints [Boyd and Vandenberghe, 2004] and robust and risk-sensitive approaches [Bertsimas et al., 2004]) requires extensive domain knowledge or additional data analysis to properly balance the relative importance of the primary objective function and the constraints. Furthermore, maximizing an objective function that appears to reasonably represent desired behavior can sometimes result in behavior that is unacceptable [von Goethe, 1797]. Similarly, encoding constraints in the feasible set (e.g., using hard constraints [Boyd and Vandenberghe, 2004], chance-constraints [Charnes and Cooper, 1959], and robust optimization approaches [Ben-Tal et al., 2009]) requires knowledge of the probability distribution from which the available data is sampled, which is often not available. Our framework for designing machine learning algorithms allows the user to constrainthebehaviorofthealgorithmmoreeasily—withoutrequiringextensive domain knowledge or additional data analysis. This is achieved by shifting the burden of ensuring that the algorithm is well-behaved from the user of the algorithm to the designer of the algorithm. This is important because machine learningalgorithmsareusedincriticalapplicationsbypeoplewhoareexpertsin their fields, but who may not be experts in machine learning and statistics. In general, it is impossible to guarantee that undesirable behavior will never occur when using machine learning algorithms. Algorithms designed using our new framework therefore preclude undesirable behaviors with high probability. We now define our new framework. Let D (a random variable) be the ∈D data provided as input to the machine learning algorithm, a, where a: Θ, D→ so that a(D) is a random variable that represents the solution produced by the 2 algorithm when given data D as input. Our framework is based on a different way of mathematically defining what an algorithm should do, which allows the user to directly place probabilistic constraints on the solution, a(D), returned by the algorithm. This differs from the standard approach wherein the user can only indirectly constrain a(D) by restricting or modifying the feasible set, Θ, or objective function, f. Concretely, algorithms constructed using our framework are designed to satisfy constraints of the form: Pr(g(a(D)) 0) 1 δ, where ≤ ≥ − g : Θ R defines a measure of undesirable behavior (as illustrated later by → example), and δ [0,1] limits the admissible probability of undesirable behavior. ∈ Since these constraints define which algorithms, a, are acceptable (rather than which solutions, θ, are acceptable), they must be satisfied during the design of the algorithm rather than when the algorithm is applied. This shifts the burden of ensuring that the algorithm is well-behaved from the user to the designer. Using our framework involves three steps: 1. The designer of the algorithm writes a mathematical expression that expresses his or her goal; in particular, the properties that he or she wants the resulting algorithm, a, to have. This expression has the following form, which we call a Seldonian optimization problem after the eponymous character in Asimov’s Foundation [Asimov, 1951]: argmaxf(a) (2) a ∈A s.t. i 1,...,n , Pr(g (a(D)) 0) 1 δ , i i ∀ ∈{ } ≤ ≥ − where is the set of all algorithms that will be considered, f : R A A → is now an objective function that quantifies the utility of an algorithm, and we allow for n 0 constraints, each defined by a tuple, (g ,δ ), where i i ≥ i 1,...,n . Noticethatthisisincontrasttothestandardapproach—in ∈{ } thestandardapproachthemathematicalexpression,(1),definesthegoalof thealgorithm, whichistoproduceasolution withagivensetofproperties, while in our framework the mathematical expression, (2), defines the goal of the designer, which is to produce an algorithm with a given set of properties. 2. The user should have the freedom to define one or more g that captures i his or her own desired definition of undesirable behavior, and should be able to do so without knowledge of the distribution of D or even the value of g (θ) for any θ Θ. This requires the algorithm, a, to be compatible i ∈ with many different definitions of g . The designer should therefore specify i the class of possible definitions of g with which the algorithm will be i compatible, and should provide a means for the user to tell the algorithm which definition of g should be used. Later we provide examples of how i this can be achieved. 3. The designer creates an algorithm, a, which is a (possibly approximate) solution to the mathematical statement in the first step, and which allows for the class of g chosen in the second step. In practice, designers rarely i 3 produce algorithms that cannot be improved upon, which implies that they may only find approximate solutions to (2). Our framework allows for this by only requiring a to satisfy the probabilistic constraints while attempting to optimize f; we call such algorithms Seldonian. Once a Seldonian algorithm has been designed, it has built-in guarantees about theprobabilitythatitwillproduceundesirablebehavior, andausercanapplyit byspecifyingoneormoreg tocapturehisorherdesireddefinitionofundesirable i behavior, and δ to be the maximum admissible probability of the undesirable i behavior characterized by g . i To show the viability of our framework, we used it to design regression and reinforcement learning algorithms. Constraining the behavior of regression algorithms is important because, for example, they have been used for medical applications where undesirable behavior could delay cancer diagnoses [Zacharaki et al., 2009; Mangasarian et al., 1995], and because they have been shown to sometimes cause racist, sexist, and other discriminatory behaviors [Weber, 2012; Angwin et al., 2016]. Similarly, reinforcement learning algorithms have been proposed for applications where undesirable behavior can cause financial losses [Li et al., 2010], environmental damage [Houtman et al., 2013], and even death [Moore et al., 2010]. The regression algorithm that we designed attempts to minimize the mean squared error of its predictions while ensuring that, with high probability, a statistic of interest, g(θ), of the returned solution, θ=a(D), is bounded. The definition of this statistic can be chosen by the user to capture his or her desired definition of undesirable behavior (e.g., the expected financial loss that results from using a given solution, θ). Importantly, the user of the algorithm may not know the value of this statistic for any solution. We must therefore provide the user with a way to tell our algorithm the statistic that he or she would like to bound, without requiring the user to provide the value of the statistic for any solution (see the second step of our framework). To achieve this, we allow the user to specify a sample statistic, and we define g so as to ensure that the expected value of the sample statistic is bounded with high probability. Thecreationofaregressionalgorithm(thethirdstepoftheframework)with thepropertiesspecifiedduringthefirsttwostepsoftheframeworkischallenging. This is to be expected given the shifted burden discussed previously. In the supplemental document we provide a detailed description of how we performed the third step of our framework for both our regression and reinforcement learning algorithms. At a high level, these algorithms were created by using statistical tests like Student’s t-test and Hoeffding’s inequality to transform sample statistics into bounds on the probability that g(a(D)) > 0, i.e., into bounds on the probability of undesirable behavior. Importantly, these two algorithms are not the contribution of this work—they are merely examples to provethatthedesignofalgorithmsusingourframeworkispossibleandtractable. We applied a variant of our Seldonian regression algorithm to the problem of predicting how well an applicant to a university would perform if admitted, using a user-selected sample statistic that captures one form of discrimination. 4 Figure 1: We used five different regression algorithms to predict students’ grade point averages (GPAs) during their first three semesters at university based on their scores on nine entrance exams. We used real data from 43,303 students. Here, the user-selected definition of undesirable behavior corresponds to large differences in mean prediction errors (predicted GPA minus observed GPA) for applicants of different genders. This plot shows the mean prediction errors for male and female students when using each regression algorithm, and includes standard error bars. We used three standard algorithms: least squares linear regression (LR), an artificial neural network (ANN), and a random forest (RF), and two variants of our Seldonian algorithm: QNDLR and QNDLR(λ). All of the standard methods tend to drastically over-predict the performance of male students and under-predict the performance of female students, while the two variants of our Seldonian regression algorithm do not. In particular, our algorithms ensure that, with approximately 95% probability, the expected predictionerrorsformenandwomenwillbewithin(cid:15)=0.05,andbotheffectively preclude the sexist behavior that was exhibited by the standard algorithms. Figure1presentstheresultsofthisexperiment,whichshowthatcommonlyused regression algorithms designed using the standard approach discriminate against female students, and that the user can easily limit this sexist behavior using our Seldonian regression algorithm. Next we used our framework to design a Seldonian reinforcement learning algorithm[SuttonandBarto,1998]: onethat,unlikeregressionandclassification algorithms, makes a sequence of dependent decisions. In this context, a solution, θ, is called a policy; a history, H, (a random variable) denotes the outcome of using a policy to make a sequence of decisions; and the available data, D, is a set of outcomes produced by some initial policy, θ . Since it is Seldonian, 0 5 our algorithm searches for an optimal policy while ensuring that Pr(g(a(D)) ≤ 0) 1 δ. The algorithm we designed is compatible with g of the form: ≥ − g(θ) := E[r(H)θ ] E[r(H)θ], where the user selects r(H) to measure 0 0 0 0 | − | − his or her own definition of how undesirable the outcome H is. That is, with probabilityatleast1 δ,thealgorithmwillnotoutputapolicy,θ,thatincreases − the user-specified measure of undesirable behavior. Notice that the user need only be able to recognize undesirable behavior to define r; the user does not 0 need to know the distributions over trajectories, H, that result from applying different policies. For example, the user might define r(H)= 1 if undesirable 0 − behavior occurred in H, and r(H)=0 otherwise. 0 WeappliedourSeldonianreinforcementlearningalgorithmtotheproblemof automatically improving the treatment policy (treatment regimen) for a person with type 1 diabetes [Bastani, 2014]. In this application, a policy, θ, determines theamountofinsulinthatapersonshouldinjectpriortoeatingameal,andeach outcome, H, corresponds to one day. We defined r(H) to be a measure of the 0 − prevalence of hypoglycemia (dangerously low blood sugar levels) in the outcome H. Figure 2 shows the result of applying both our Seldonian algorithm and a similar algorithm designed using the standard approach. For this experiment we used a detailed metabolic simulator [Dalla Man et al., 2014]. Given the recent surge of real-world machine learning applications and the corresponding surge of potential harm that they could cause, it is imperative that machine learning algorithms provide their users with an effective means for controlling behavior. To this end, we have proposed a new framework for designing machine learning algorithms and shown how it can be used to construct algorithms that provide their users with the ability to easily (that is, without requiring additional data analysis) place limits on the probability that the algorithm will produce any specified undesirable behaviors. Using this framework, we designed two algorithms and applied them to important high-risk real-world problems, where they precluded the sexist and harmful behavior exhibited when using algorithms designed using the standard approach. Algorithmsdesignedusingourframeworkarenotjustareplacementformachine learning algorithms in existing applications—it is our hope that they will pave thewayfornewapplicationsforwhichtheuseofmachinelearningwaspreviously deemed to be too risky. In the remainder of this document we provide additional details and support for the claims that we have made. In 3 we discuss the standard approach § for designing machine learning algorithms in more detail. In 4 we discuss the § limitations of the standard approach that we address with our new framework. In 5 we provide a detailed description of Seldonian optimization problems—the § problemformulationatthefoundationofourframework. In 6wegiveexamples § of how Seldonian regression algorithms can be designed and applied to limiting sexist behavior when applied to real-world data. In 7 we give an example of § how a Seldonian reinforcement learning algorithm can be designed and applied to automatically personalizing treatment policies for type 1 diabetes. 6 (A) (B) (C) (D) Figure 2: Results from diabetes treatment simulations, averaged over 200 trials. (A) Probability that each method returns treatment policies that increase the prevalenceofhypoglycemia. Thealgorithmdesignedusingthestandardapproach often proposed policies that increased the prevalence of hypoglycemia, and thus couldtriplethefiveyearmortalityrate[McCoyetal.,2012],eventhoughitused anobjectivefunction(rewardfunction)thatpenalizedinstancesofhypoglycemia. Bycontrast,acrossalltrials,ourSeldonianalgorithmwassafe—itneverchanged the treatment policy in a way that increased the prevalence of hypoglycemia. (B) Probability that each method returns a treatment policy that differs from the initial treatment policy. Our Seldonian algorithm was able to safely improve upon the initial policy with just 3–6 months of data—a very reasonable amount [Bastani,2014]. (C)Thedistributionoftheexpectedreturns(objectivefunction values) of the treatment policies returned by the standard algorithm. The blue line depicts the sample mean, and the dashed red lines are the medians. All pointsbelow 0.115correspondtocaseswherethestandardalgorithmdecreased − performance and produced undesirable behavior (an increase in the prevalence ofhypoglycemia). (D)Similarto(C),butshowingresultsforournewalgorithm. The green line is the average of the performance when the algorithm produced a treatment policy that differs from the initial treatment policy. This line is not in (C) because the standard algorithm always produced policies that differed from the initial policy. Notice that all points are above 0.115, indicating that − our algorithm never produced undesirable behavior. 7 2 Notation In this document all random variables are denoted using uppercase letters, and expectations, E[], are always with respect to all of the random variables within · the brackets unless otherwise specified. We use uppercase calligraphic letters to denote sets (except that Θ will also denote a set). We use lowercase letters to denote indices, elements of sets, and functions. We use N 0 and N>0 to denote ≥ the natural numbers including zero and not including zero, respectively. We use M| to denote the transpose of a matrix M, and we use column vectors rather than row vectors. We also reserve the symbol n to always denote the number of constraints in an optimization problem (Seldonian or otherwise) and m to denote the number of available data points. 3 The Standard Approach for Designing Machine Learning Algorithms When designing a machine learning algorithm using the current standard ap- proach, the first step is to define, mathematically, what the algorithm should try to do—the goal of the algorithm. At a sufficiently abstract level, this goal can be expressed the same way for almost all machine learning problems: find a solution, θ?, within some feasible set, Θ, that maximizes an objective function, f :Θ R. That is, the algorithm should search for an optimal solution, → θ? argmaxf(θ). ∈ θ Θ ∈ To ground this abstract description of how the goal of an algorithm can be specified, consider the problem of designing a simple one-dimensional regression algorithm. Wecanbeginbyspecifyingthefeasibleset,Θ,tobeasetoffunctions. Each function, θ Θ, takes a real number as input and produces as output a ∈ real number, that is, θ:R R. Let X and Y be correlated random variables, → and our goal be to estimate Y given X. We might then define the objective function to be the negative mean squared error (MSE): f(θ):= E (θ(X) Y)2 . − − This completes the formal specification(cid:2)of what our(cid:3)algorithm should do, and so wecanbeginworkingonhow ouralgorithmshoulddoit. Forexample, wemight have our algorithm construct an estimate, fˆ, of f using data—m realizations of (X,Y), that is, (x ,y ) for i=1,...,m. For example, fˆcould be the negative i i sample mean squared error: m 1 fˆ(θ):= (θ(x ) y )2, i i −m − i=1 X and our algorithm could return an element of argmax fˆ(θ). θ Θ ∈ 8 Similarly,whendesigningareinforcement learning algorithm,wemightdefine ΘtobeasetofpoliciesforaMarkovdecisionprocessandf(θ)tobeanobjective function like the expected discounted return [Sutton and Barto, 1998] or average reward [Mahadevan, 1996] of the policy θ. In this case, we might create an algorithm that maximizes an estimate, fˆ, of f, constructed from data (sample state-transitions and rewards), or we might create an algorithm that, like Q- learning[Watkins,1989],indirectlymaximizesf byoptimizingarelatedfunction. When designing a classification algorithm, Θ would be a set of classifiers and f a notion of how often the classifier selects the correct labels [Boser et al., 1992; Breiman, 2001; Krizhevsky et al., 2012]. When designing an unsupervised learning algorithm, we might define Θ to be a set of statistical models and f(θ) to be a notion of how similar θ is to a target statistical model [Dempster et al., 1977]. 4 Limitations of the Standard Approach Onceamachinelearningresearcher hasdesignedamachinelearningalgorithm,it canbeusedduringthecreationofanagent—amachinethatusesimplementations of one or more machine learning algorithms to solve a particular problem. The user of the algorithm—the person creating the agent—can be nearly anyone, fromachildusingLEGOMindstorms,toabusinesspersonusingMicrosoftExcel tofitalinetodatapoints,toa(non-computer)scientistusingdataanalysistools to analyze research data, to a machine learning researcher using a reinforcement learning algorithm for part of the controller of an autonomous vehicle. The primary limitation of the standard approach for designing machine learning algorithms is that it does not make it easy for the user of a machine learning algorithm (who may or may not be a machine learning expert) to specify and regulate the desirable and undesirable behaviors of the agent. The user of the machine learning algorithm usually has the freedom to make a few decisions when applying a machine learning algorithm, and these decisions can impact the solution that the algorithm returns. It is through these decisions that the user can constrain the behavior of an agent using a machine learning algorithm. Threeofthemostcommondecisionsthattheusercanmakearewhat the feasible set, Θ, and objective function, f, should be, and which machine learning algorithm to use. Although in principle there might be definitions of Θ and f that cause the agent to never produce undesirable behaviors, in practicethereisnowayfortheusertoknowthesedefinitionswithoutperforming additional data analysis that can be challenging even for a machine learning researcher. As an example, consider a reinforcement learning application where an artificial neural network is used to control a robot. In this context, the feasible set, Θ, is a set of neural networks (typically with the same structure, but different weights), each of which would cause the robot to behave differently. Reinforcement learning algorithms can be used to search for an artificial neural network within Θ that performs well in terms of some user-defined performance 9 measure. Theuserofareinforcementlearningalgorithmmightwanttoimplement Asimov’s first law of robotics, which essentially states that a robot may not harm a human [Asimov, 1951]. However, the user of the algorithm typically does not know whether any particular artificial neural network, θ, will cause the robot to harm a human or not. This means that the user of the algorithm cannot specify Θ to only include artificial neural networks that produce safe behavior. Additionally, since most reinforcement learning algorithms are not guaranteed to produce an optimal solution given finite data, simply adding a penaltyintotheobjectivefunctiontopunishharmingahumandoesnotpreclude the algorithm from returning a suboptimal solution (artificial neural network) that causes harm to a human. This means that the user of the algorithm has no easy way to constrain the behavior of the agent—the user of the algorithm must have deep knowledge of the environment that the robot will be faced with, what each artificial neural network, θ Θ, does, and how the reinforcement learning ∈ algorithm works, in order to ensure that the reinforcement learning algorithm will not cause the robot (agent) to harm a human. In 4.1 we present an illustrative example of a problem where it is difficult § to ensure that an agent is well-behaved. In 4.2 through 4.8 we then discuss § § the limitations of current techniques for ensuring that an agent is well-behaved, using the aforementioned illustrative example to ground our discussion. The example that we present shows how undesirable behavior can occur even when using one of the most widely used and well-studied data analysis tools: linear regression. 4.1 An Illustrative Example To ground our subsequent discussions, we propose a simple linear regression problem. For this problem, our goal is to predict the aptitudes of job applicants based on numerical values that describe the qualities of their r´esum´es. Although later we consider a similar problem using advanced regression algorithms and real data, here we use easily reproduced synthetic data and only consider linear regression algorithms. We will show how linear regression algorithms designed using the standard approach can result in predictions of applicant performance that systematically discriminate against a group of people (such as people of one gender or race). Let each applicant either be in a set, , or another set, . For example, A B A could be the set of all possible female applicants and could be the set of all B possible male applicants. We refer to the group that the applicant belongs to as his or her type. We are given a training set that contains data describing the r´esum´es of m=1,000 previous applicants, the applicants’ actual aptitudes, and their types. For each i 1,...,m , let xi R be a number describing the ∈{ } ∈ quality of the ith applicant’s r´esum´e, yi R be a measure of the applicant’s ∈ actual aptitude (which we would like to predict given x ), and let t 0,1 be i i ∈{ } an indicator of what type the applicant is—t =0 if the applicant is in and i A t =1 if the applicant is in . For simplicity, we assume that all terms, x and i i B y , have been normalized so that they each tend to be inside the [ 3,3] interval. i − 10

Description:
solute discriminatory statistic. Thus, QNDLR(λ) is .. is not limited to gender: Sweeney [2013] found that Google AdSense was more likely to generate
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.