Adaptive Questionnaires for Direct Identification of Optimal Product Design 7 1 Max Yi Ren∗1 and Clayton Scott†2 0 2 1Department of Mechanical Engineering, Arizona State University n 2Division of Electrical and Computer Engineering, University of a J Michigan, Ann Arbor 5 ] January 6, 2017 L M . t a t Abstract s [ Weconsidertheproblemofidentifyingthemostprofitableproduct 1 designfromafinitesetofcandidatesunderunknownconsumerprefer- v ence. Astandardapproachtothisproblemfollowsatwo-stepstrategy: 1 First, estimatethepreferenceoftheconsumerpopulation, represented 3 2 as a point in part-worth space, using an adaptive discrete-choice ques- 1 tionnaire. Second, integrate the estimated part-worth vector with en- 0 gineering feasibility and cost models to determine the optimal design. . 1 In this work, we (1) demonstrate that accurate preference estimation 0 is neither necessary nor sufficient for identifying the optimal design, 7 (2) introduce a novel adaptive questionnaire that leverages knowledge 1 about engineering feasibility and manufacturing costs to directly de- : v termine the optimal design, and (3) interpret product design in terms i of a nonlinear segmentation of part-worth space, and use this inter- X pretation to illuminate the intrinsic difficulty of optimal design in the r a presence of noisy questionnaire responses. We establish the superior- ityoftheproposedapproachusingawell-documentedoptimalproduct design task. This study demonstrates how the identification of opti- mal product design can be accelerated by integrating marketing and manufacturing knowledge into the adaptive questionnaire. ∗[email protected] †[email protected] 1 1 Introduction Understandingconsumerpreferencesisusuallynecessaryduringprofit-driven product design. A common tool for this cause is Discrete Choice Analysis (DCA). In DCA, the designer collects consumer responses on product alter- natives through a questionnaire, and uses the data to create a preference model, which is subsequently used for the identification of the optimal (i.e., mostprofitable)design. Whilenumerousresearchhasfollowedthistwo-step procedure (see, e.g., [1–6]), there has been limited theoretical discussion of the relationship between the accuracy of a preference model and the market performance of the resultant product (Some empirical studies can be found in [7,8]). Although having a precise understanding of consumer preference does not hurt, it may not be necessary for product design: For example, when all but one candidate designs have higher cost than their potential profit, that single design is optimal regardless of our understanding of con- sumer preference. While this is an extreme case, it suggests that we could potentially accomplish the task of identifying the optimal design with fewer resources spent on the questionnaire if it directly identifies the optimal de- sign without focusing on preference learning as an intermediate step. Thispaperaimstoturnthisintuitionintoarigorousquestionnairemech- anism. Toaccomplishthisgoal,wearguethatfindingtheoptimalproductis agroup identificationproblem(definedbelow), wherethegroup label(theID of the optimal design) of an object (the consumer preference model) is to be identified through binary queries (the questionnaire). We further show that an existing algorithm for group identification can be extended to optimal product design. By drawing this connection, we propose a questionnaire mechanism that adaptively chooses queries to greedily minimize the expected number of queries. We compare this approach with a standard adaptive questionnaire by Abernethy et al. [9] (called Abernethy’s alg. in the sequel) on a well- documented product design problem. The performance of the two methods reflects their different objectives: While Abernethy’s alg. yields better pref- erence estimates, our method has consistently better performance in identi- fying the optimal product, leading to a significant difference in the expected profit of the resultant product. The key contributions of this paper are as follows. (1) We clearly ex- plain the mathematical difference between preference modeling and optimal product identification, and thus show that accurate preference estimation is neither necessary nor sufficient for identifying the optimal product de- sign. To summarize, consider the true preference as a point in a “preference 2 space”: The former is equivalent to estimating the location of that point. In the latter, the entire space is segmented with each segment representing preferences that share the same optimal design. The objective of the latter is thus to identify the true segment, i.e., the segment where the true prefer- encelies. (2)Wedevelopareal-timeadaptivequestionnairemechanismthat identifies optimal designs more effectively than a standard method in noise- free scenarios, by leveraging engineering feasibility and cost models. (3) We point out a critical issue by construction in the use of preference models in product design: Even when the estimation of preference is close to the truth, the model may still lead to an inferior design decision. This happens when the true segment is “thin” or close to the origin, with the latter case representing low sensitivity of consumer preferences to design changes or equivalently, high noise in responses to choice questions. The results pre- sented in this paper call for a more systematic product design methodology whenitsgoalistoserveproductdesignerstoidentifyprofitabledesignsmore cost effectively rather than to help marketing researchers and psychologists to precisely understand human preference. Therestofthepaperisstructuredasfollows. Sec.2providespreliminary knowledge necessary for in-depth discussion. Sec. 3 introduces the question- naire mechanism and Sec. 4 its implementation. Sec. 5 presents results from a case study. Lastly, discussions on the limitations of the proposed method and relaxations of its assumptions are presented in Sec. 6. 2 Preliminaries Weintroducebackgroundmaterialnecessaryforanelaborationonthisprob- lem statement. 2.1 Profit-driven product design A profit-driven product design problem can be formulated as follows, with z being the product design and w the parameters of a preference model: max profit(z;w) := market share(z;w)(price(z)−cost(z)). (1) z∈Z More specifically, z := (z ,··· ,z ) ∈ Z is an binary attribute vector (1) (D) whereeachelementrepresentstheexistenceofacertainattributelevelinthe product1. Z representsthesetofallfeasiblecombinationsofattributelevels. 1For example, the attribute “MPG” of a car has levels “20”, “30”, and “40”. The binary encoding of attributes we use in this paper, e.g., z ⇔ MPG = 20, z ⇔ (1)=1 (2)=1 3 Unitpriceisindicatedbyoneoftheattributeswhileunitcostisafunctionof z often built upon engineering models of the problem. Parameters w ∈ RD are called part-worths of a consumer utility model u(z;w). The preference influences the profit through the market share. Specifically, we use the logit model [10] market share(z;w) ∝ sigmoid(u(z;w)−u(z ;w)), (2) 0 where z is some fixed competing product to z. Extension to multiple com- 0 peting products is trivial. The RHS of the equation represents the proba- bility of choosing design z by a consumer, and is proportional to the market share under homogeneous consumer preference, i.e., everyone shares the same deterministic w. As discussed in Sec. 6, other choice models, e.g., probit and non-compensatory ones [11–13], can replace Eq. (2) without sig- nificantly affecting our discussion. 2.2 Preference learning Clearly, w plays a critical role in formulating and solving Eq. (1). In reality, however, the true part-worths of a target consumer group, denoted as w∗, areoftenunknown. DCAderivesapart-worthestimate,denotedaswˆ,using a series of queries. Each query consists of a set of design alternatives, from which the participant is asked to pick the relatively more preferred design. TheaccumulatedresponsesformadatasetS(Q) := {(z ,z ,···)}Q where q1 q2 q=1 z is the preferred design in the qth query, and {z ,···} are the unchosen q1 q2 alternatives. In the sequel, we will assume pairwise choices, and discuss the extension to multiple choices in Sec. 6. When necessary, superscripts will be omitted to avoid clutter. Given this data set and a prior belief of consumer preference p(w), the posterior density function p(w;S(Q)) can be derived following a regularized logistic or hierarchical Bayes [14,15] model. Under either model, the maximum a posteriori (MAP) estimator can be found as wˆ := argmax p(w;S), and the uncertainty of this point estimate, i.e., its w variance-covariance matrix, can be calculated by the inverse of the Hessian H(wˆ,S) of the negative log-likelihood −log(p(w;S)). In this paper, we use a regularized logistic model due to its convexity and real-time computation ofMAPestimates. Acomparisonbetweenthesetwoapproachesisdiscussed MPG = 30, etc., is a common way to treat nonlinearity in preference with respect to attribute levels. 4 in [16]. Assuming a regularized logistic model, we have: q −log(p(w;S)) = (cid:88)log(1+wT∆z(q(cid:48)))+ CwTw, (3) 2 q(cid:48)=1 where ∆z(q(cid:48)) := z −z , and the parameter C will be selected by cross- q(cid:48) q(cid:48) 2 1 validation (see Sec. 4). The goal of preference learning is to find a part- worth estimate close to the ground truth and with low uncertainty (e.g., determinant of H(wˆ,S)), while limiting Q, the number of queries posed. 2.3 Optimal product identification Since the uncertainty in part-worth estimation always exists under a finite number of queries, one needs to modify Eq. (1) to incorporate p(w;S). Two modified problem definitions for optimal product identification exist: One is to optimize the expected profit: (cid:90) max E [profit(z;w)] = profit(z;w)p(w;S)dw, (4) w z∈Z RD and the second is to optimize the probability of being the most profitable (denoted as π(z)): (cid:90) max π(z) := E [1(z;w)] = 1(z;w)p(w;S)dw, (5) w z∈Z RD where1(z;w) := 1(profit(z;w) > profit(z(cid:48);w),∀z(cid:48) (cid:54)= z)and1(condition) = 1 when the condition is true and 0 otherwise. The design decision generated fromthesetwoobjectivesmaynotbeconsistentforanarbitraryS. Thegoal of optimal product identification is to acquire such queries and responses so that the design decision matches the true solution to Eq. (1). 2.4 Assumptions The majority of the discussion will be based on the following conditions and assumptions: (1) We consider a consumer group who share similar prefer- ences; (2)queriesarebinary,i.e.,eachconsistsoftwodesignalternativesand one must be chosen; (3) Z is a finite set; (4) the preference model is linear, i.e., u = wTz; (5) ||w∗|| is a large number, reflecting low-noise responses and substantial information for measurement; (6) engineering models, i.e., product costs and feasibility, are deterministic; and (7) the predicted opti- mal design is derived from Eq. (5). We will discuss in Subsec. 5.5 the issue with response noise; and in Sec. 6 the relaxation of assumptions (1) to (4), as well as the connection between Eqs. (4) and (5). 5 2.5 Querystrategiesforpreferencelearningandoptimalprod- uct identification A query strategy defines how the qth query shall be created based on the current knowledge p(w;S(q−1)). For preference learning purposes, existing strategies[9,16–18]havebeendevelopedtogreedilyminimizetheestimation uncertainty of wˆ, which is equivalent to maximizing the determinant of H(wˆ,S) (see Subsec. 3.3 for a discussion of [9]). Below we show that the query strategies for optimal product identification should be different than for preference learning. Without loss of generality, we will use a simple 2D case for visualization. Consider K candidate products with fixed costs. By setting the true part-worth to some arbitrary w, we can solve Eq. (1) to derive the optimal product for that particular w. Doing so for all w ∈ R2 leads to a segmenta- tion of the part-worth space, as illustrated in Fig. 1. Each segment, denoted as W , corresponds to a set of w that share the same most profitable design k z . Due to the nonlinearity in the mapping from preference to profit (see k Eqs. (1) and (2)), the segmentation is also nonlinear. To elaborate, any two candidate products z and z have the same profit when w satisfies 1 2 exp(wTz )(price(z )−cost(z )) exp(wTz )(price(z )−cost(z )) 1 1 1 2 2 2 = . (6) exp(wTz )+exp(wTz ) exp(wTz )+exp(wTz ) 1 0 2 0 Let the true optimal design, induced by w∗, be z , and the solution to k∗ Eq. (5) be zˆ. Ideally, one would like to obtain a perfect preference model (wˆ = w∗)andthusthecorrectoptimalproduct(zˆ = z ). However,weshow k∗ in Figs. 1a-b that a good estimation wˆ ≈ w∗ does not guarantee zˆ = z , k∗ neither does zˆ = z require wˆ ≈ w∗. In addition, readers may notice k∗ that the segments cluster around the origin, in which region consumers are less sensitive to design changes2. As we will discuss in Subsec. 5.4, this clustering causes difficulty in correctly identifying the optimal design under indifferent preference or noisy responses. It is worth noting, however, that due to the indifference or high noise in consumer preference, all segments near the origin lead to similar expected profit as market shares are more evenly distributed among competing products. Knowingthis,wecanfurtherdiscussthedifferencebetweenquerystrate- gies for preference learning and for optimal product identification. We start 2Low part-worth values can also be interpreted as high response noise. Salisbury and Feinberg [19] and Louviere et al. [20] pointed out the downsides of mixing these two interpretations. But we do not discuss this issue here. 6 Figure1: (a)Agoodestimationwˆ ≈ w∗ doesnotguaranteezˆ = z . π and k∗ 3 π are the probabilities for product 3 and 4 to be the optimal, respectively. 4 (b) Achieving zˆ = z does not necessarily require wˆ ≈ w∗. (c) Each query k∗ can be considered as a cut in the part-worth space. The posterior p(w;S) is shifted to the convex cone defined by these cuts. The uncertainty in wˆ is illustrated by the covariance ellipse H−1. Generation of the figure: In the wˆ space of [−10,10]2, we sample eight points uniformly as candidate products, i.e., each one is represented by two continuous product attributes, one of which is the price. For each part-worth on a dense grid of the space, we calculate z by Eq. (5). Grid points with the same z share the same k∗ k∗ color. 7 from a prior probability density p(w) (e.g., a multivariate normal distri- bution or a distribution that leads to maximum entropy, see the entropy definition in Subsec. 3.1). When a user response (z ,z ) is collected from q1 q2 iteration q, we update p(w;S(q−1)) so that the probability mass is shifted towards the half space {w|wT (z −z ) > 0}. After some Q responses are q1 q2 collected, p(w|S(Q)) will concentrate in a cone that contains w∗ (Fig. 1c). A good query strategy for learning w∗ is thus to quickly “narrow” this cone. The algorithms proposed by Toubia et al. [17,18], Abernethy et al. [9], Tong andKoller[21],andJamiesonandNowak[22]allfollowtheideaofnarrowing the feasible part-worth space by halving it. For optimal product identification, on the other hand, z is correctly k∗ identified when p(w;S(Q)) concentrates in the segment W , i.e., when k∗ p(w;S) > 0 only in W , zˆ = z according to both Eq. (4) and Eq. (5). k k∗ Therefore, a good query strategy in this case should be to quickly iden- tify the correct “group” W , by shifting the mass of p(w;S(Q)) towards it, k ratherthantowardsthepointw∗. JamiesonandNowak[22]sharesasimilar problem to ours in that a particular segment rather than a point is to be identifiedfromasegmentationofaspace. However, in[22], thesegmentsare created by hyperplanes directly defined by pair-wise comparisons, while in thispaper, thenonlinearsegmentsareinducedfromthecandidateproducts, independent from the hyperplanes defined by the collection of queries. See Fig. 2 for a comparison. 3 Query Strategy This section addresses the central question of how queries should be adap- tively made to directly minimize the expected number of queries needed to identify the optimal design. To do so, we show the connection between this problem and the problem of “group identification”, and introduce an extension to the Group Identification Splitting Algorithm [23] that solves our problem. We also review an existing strategy for preference learning from Abernethy et al. [9] (called Abernethy’s alg. in the sequel) that will be used as a benchmark in the case study. Implementation details of GISA and Abernethy’s alg. will be discussed in Sec. 4. 3.1 Group identification and optimal product identification Group identification is an extension of object identification, also known as the twenty-questions game. Consider a game between players A and B. Both players are familiar with a list of objects and the group labels they 8 Figure 2: Comparison on problem settings between (a) [22] and (b) this paper: (a) Each segment (shaded polygons) represents a preference ranking and each hyperplane a pair-wise comparison query (dotted lines) (b) The nonlinear segments (shaded) are induced by the candidate products and their costs, while queries (dotted line) are pair-wise comparisons belong to, see Table 1. Player A first picks an object from the list randomly. In object identification, Player B needs to identify the object selected by Player A, whereas in group identification, Player B only needs to determine the group to which the object belongs. To know which group Player A picked from, Player B can pick queries in the columns to ask. Based on Player A’s responses, which are binary, Player B either picks new queries or makes a guess on the group label. Player B’s query strategy affects how quickly he can identify the correct group. In this example, an intuitively good query to start with is “Q1”, since its answer directly determines which group the object is from, i.e., it allows one of the group labels to have a probability of one conditioned on the response. It is worth noting that we cannot consider groups as “meta-objects”, since objects in the same group can lead to different query responses. We now show that optimal product identification in our context shares the same elements as in Table 1. Consider the set of objects to be all part- worth vectors in RD. Each induces a list of binary responses to pairwise queries, which are defined by Z. Specifically, queries are determined by sign{wT(z −z )} for all pairs z , z . A group is a set W of part-worth i j i j k vectors that share the same most profitable design. Similar to Player B who 9 Table 1: An example of the group identification problem Group Heroes Villains Object Superman Batman Catwoman Joker Prior Prob. 0.25 0.25 0.25 0.25 Q1:Mask? Yes Yes No No Q2:Can fly? Yes No No No Q3:Female? No No Yes No hasa(non-informative)priorknowledgeofwhichobjectPlayerApicks,here objects are associated with a prior probability density p(w), and each group (0) (cid:82) has a prior probability mass π = p(w)dw. Upon acquiring the qth k W k user response, we can update S(q), the posterior p(w;S(q)) and (cid:90) π(q) = p(w;S(q))dw. (7) k W k Since p(w;S(q)) is a probability density function, we have (cid:80)K π(q) = 1. k=1 k One can see that both group identification and optimal product identifica- (Q) tion share the same process of converging π to 1 by collecting responses k∗ to binary queries. The differences from the latter are that (1) we have an infinite number of objects (candidate part-worth vectors), and (2) the conditional probabilities are not proportional to the prior. For example, in Table1, theconditionalprobabilitiesfor“superman”and“batman”arestill equal after receiving “yes” from Q1; while in the case of optimal product identification, each response will nonlinearly affect p(w;S(q)), which in turn determines Π(q) := (π(q),...,π(q)). 1 K 3.2 The Group Identification Splitting Algorithm (GISA) GISA greedily minimizes the expected number of queries needed to reach H(Π(Q)) = 0, which occurs iff all probability mass is on one design. Here we explain the algorithm in the context of optimal product identification. Readers are referred to [23] for the derivation of the algorithm for group identification. Let a “path” be a sequence of queries and responses. Upon receiving the response for each query, we move from one node on the path to the next, and update Π(q). With enough queries, we will reach π(Q) = k∗ 1. Given a query strategy, one can generate multiple paths for objects (ws) drawn from the prior (p(w)). These paths form a binary decision tree (Fig.3.2). Notethatmultipleleafnodesofthetreecansharethesamegroup label. The expected number of queries needed to reach a leaf, denoted as 10