Brain Sci. 2014, 4, 49-72; doi:10.3390/brainsci4010049 OPEN ACCESS brain sciences ISSN 2076-3425 www.mdpi.com/journal/brainsci/ Article Toward a New Application of Real-Time Electrophysiology: Online Optimization of Cognitive Neurosciences Hypothesis Testing Gaëtan Sanchez 1,2,*, Jean Daunizeau 3, Emmanuel Maby 1,2, Olivier Bertrand 1,2, Aline Bompas 1,2 and Jérémie Mattout 1,2 1 Brain Dynamics and Cognition Team, Lyon Neuroscience Research Center, INSERM U1028-CNRS UMR5292, Lyon F-69000, France; E-Mails: [email protected] (E.M.); [email protected] (O.B.); [email protected] (A.B.); [email protected] (J.M.) 2 University Lyon 1, Lyon F-69000, France 3 Brain and Spine Institute, Paris F-75000, France; E-Mail: [email protected] * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +33-4-7213-8936; Fax: +33-4-7213-8901. Received: 1 November 2013; in revised form: 16 December 2013 / Accepted: 10 January 2014 / Published: 23 January 2014 Abstract: Brain-computer interfaces (BCIs) mostly rely on electrophysiological brain signals. Methodological and technical progress has largely solved the challenge of processing these signals online. The main issue that remains, however, is the identification of a reliable mapping between electrophysiological measures and relevant states of mind. This is why BCIs are highly dependent upon advances in cognitive neuroscience and neuroimaging research. Recently, psychological theories became more biologically plausible, leading to more realistic generative models of psychophysiological observations. Such complex interpretations of empirical data call for efficient and robust computational approaches that can deal with statistical model comparison, such as approximate Bayesian inference schemes. Importantly, the latter enable the optimization of a model selection error rate with respect to experimental control variables, yielding maximally powerful designs. In this paper, we use a Bayesian decision theoretic approach to cast model comparison in an online adaptive design optimization procedure. We show how to maximize design efficiency for individual healthy subjects or patients. Using simulated data, we demonstrate the face- and construct-validity of this approach and illustrate its extension to electrophysiology and multiple hypothesis testing based on recent Brain Sci. 2014, 4 50 psychophysiological models of perception. Finally, we discuss its implications for basic neuroscience and BCI itself. Keywords: brain-computer interfaces; real-time electrophysiology; adaptive design optimization; hypothesis testing; Bayesian model comparison; Bayesian Decision Theory; generative models of brain functions; cognitive neuroscience 1. Introduction 1.1. On Common Challenges in BCI (Brain-Computer Interfaces) and Cognitive Neurosciences Brain-computer interfaces (BCIs) enable direct interactions between the brain and its bodily environment, as well as the outside world, while bypassing the usual sensory and motor pathways. In BCI, electroencephalography (EEG) is by far the most widely used technique, either with patients or healthy volunteers, simply because it offers a non-invasive, direct and temporally precise measure of neuronal activity at a reasonable cost [1]. BCI research is still mostly driven by clinical applications, and in this context, EEG has been used for a variety of applications. These range from replacing or restoring lost communication or motion abilities in patients suffering from severe neuromuscular disorders [2–4] and devising new therapies based upon neurofeedback training [5], to active paradigms in disorders of consciousness to better diagnose non-responsive patients [6] and possibly to communicate with those in a minimally conscious state [7]. Interestingly, common to most of these BCI objectives, but also to the ones in basic and clinical neurosciences, is the refinement of our understanding of the functional role of electrophysiological markers and their within- and between-subject variations. In this paper, we would like to further promote the idea that BCI and cognitive neuroscience researchers can help each other in pursuing this common goal. In short, the BCI paradigm puts the subject in a dynamic interaction with a controlled environment. From the perspective of cognitive neuroscience, this is a new opportunity to study normal and pathological brain functioning and to test mechanistic neurocognitive hypotheses [8]. In turn, BCI can benefit from progress in neurocognitive models for decoding mental states from online and single-trial electrophysiological measures [9]. Taking BCI outside the laboratory for daily life applications with patients or healthy people raises tremendous challenges, one of which is the need to decode brain signals in real time. This means one has to be capable of making efficient and robust inference online based on very limited, complex and noisy observations. Large efforts have recently been put into developing and improving signal processing, feature selection and classification methods [10–12], as well as acquisition hardware techniques [13] and dedicated software environments [14,15]. However, the main BCI bottleneck consists in the identification of a reliable mapping from neurophysiological markers to relevant mental states. This unresolved issue advocates for tight collaborations between BCI developers, electrophysiologists and cognitive neuroscientists. Thankfully, a recent trend (and one that is increasingly catching on) has been to increase the permeability of the border between the BCI and cognitive neuroscience communities. New Brain Sci. 2014, 4 51 applications have emerged that rely on both disciplines and, thus, bring short-term benefit to both. One example is the so-called brain-state-dependent stimulation approach (BSDS) [16], the principle of which is to use BCI as a research tool for cognitive neuroscience, namely to study causal relationships between brain state fluctuations and cognition. In the BSDS, the functional role of a brain state is studied by delivering stimuli in real time to subjects, depending on their brain’s actual physiological state. Other examples illustrate the reverse direction of this putative multidisciplinary cross-fertilization, showing how advances in cognitive neuroscience may improve BCI performance. An example is connectivity model-based approaches to neurofeedback, as demonstrated recently using fMRI (functional Magnetic Resonance Imaging) [17]. It is to be noted that such emerging applications tend to extend the usefulness of BCI and real-time data processing to non-invasive techniques other than EEG, such as fMRI and MEG (Magnetoencephalography), which have similar overall principles, but might be even more effective for answering some of the cognitive neuroscience questions. In this paper, we extend and formalize the BSDS approach by showing that our ability to process neuroimaging data online can be used to optimize the experimental design at the subject level, with the aim of discriminating between neurocognitive hypotheses. In experimental psychology and neuroimaging, this is a central issue, and examples range from stair-case methods to estimating some individual sensory detection or discrimination threshold [18], to design efficiency measures to optimize the acquisition parameter or the stimulus onset asynchrony (SOA) in fMRI studies [19]. The former operates in real time in the sense that the next stimulation depends on the previous behavioral response and is computed in order to optimize model fitting. The latter operates offline, prior to the experiment, and its aim is to optimize model comparison. 1.2. Adaptive Design Optimization We introduce a generic approach in which real-time data acquisition and processing is aimed at discriminating between candidate mappings between physiological markers and mental states. This approach is essentially an adaptive design optimization (ADO) procedure [20]. The origins of ADO stem back to sequential hypothesis testing methods [21], whose modern forms have proven useful in human, social and educational sciences, where typical experiments involve a series of questions to assess the level of expertise of a particular subject [22]. The general principle is fairly straightforward. Figure 1 illustrates its application in the context of human electrophysiology and neuroimaging. In contrast with standard (non-adaptive) experiments, in ADO, the total number of trials is not set in advance, nor is the nature of the stimulation at each trial or stage of the experiment. Moreover, one does not wait until the end of the data acquisition process to proceed with data analysis and statistical inference. Instead, for each trial, the appropriate data features are extracted in order to up-date our (the experimenter’s) information about the model parameters and to assess the model plausibility itself. Based on these estimates, a decision is made regarding some relevant design parameters for the next trials. The decision criterion should reflect the scientific objective of the experiment, e.g., a target statistical power for parameter estimation. This implies that some threshold can be met that would terminate the current experiment. In other words, ADO behaves like classical approaches, except that it operates online, at each trial. In turn, incoming trials are considered as future experiments, whose design can be informed by past observations or simply become unnecessary. At the level of a single Brain Sci. 2014, 4 52 subject, ADO can be used to improve on three problems: (i) model parameter estimation; (ii) hypothesis testing per se; (iii) the duration of the experiment. In the fields of experimental psychology and electrophysiology, recent forms of ADO have been applied to estimating psychometric functions [23], optimizing the comparison of computational models of memory retrieval [24] and optimizing the duration of the experiment when comparing alternative neuronal models [25]. However, optimizing parameter estimation and hypothesis testing do not call for the same criteria and might not be possible simultaneously. In this paper, we focus on ADO for optimizing model comparison, which appears to be of primary interest in cognitive neuroscience. This is because, over the past decade, dynamic and non-linear computational models of neuroimaging and behavioral data have been flourishing [26]. In particular, established control theoretic approaches now rely upon biologically and psychologically plausible models of fMRI, electrophysiological or behavioral data (see, e.g., dynamical causal models (DCMs); [27–29]). Such generative models aim to explain the causal relationship between experimental (e.g., cognitive) manipulations and the observed neurophysiological or behavioral responses [30]. In particular, such tools have now been used to compare alternative models of learning and decision making in humans [28]. Importantly, these models are embedded in a Bayesian statistical framework, which allows one to deal with complex (e.g., probabilistic) models by introducing prior knowledge about unknown model parameters. Note that statistical inference can be made quick and efficient through the use of generic approximation schemes (cf. variational Bayes approaches; [31]). To extend ADO to dynamical neurocognitive models of electrophysiology data, we bring together such variational Bayesian approaches (which can be used in real time) and recent advances in design optimization for Bayesian model comparison (which can deal with complex models; [32]). This paper is organized as follows. In the Theory and Methods section, we first describe the class of dynamical models that we compare. To make this paper self-contained, but still easy to read, we provide an appendix with a comprehensive summary of the variational Bayesian inference approach (see Appendix A1) and the design efficiency measure (see Appendix A2) that we rely on, in this new instantiation of ADO. We also emphasize how this compares with the recent pioneering approach for ADO in experimental psychology [20,24]. In the second part of the methods section, we introduce our validation strategy, which consists first of a demonstration of the face and construct validity of our approach by considering the same behavioral example as in [20]. Continuing to use synthetic data, we then demonstrate the extension of our approach to comparing variants of recent dynamical models of perceptual learning. In particular, by simulating several subject datasets, we illustrate how ADO compares with classical designs and how it optimizes hypotheses at the individual level. The next section presents the results of this validation. In the last section, we discuss these results, the perspectives they offer, as well as the challenges we now face to put ADO into practice. Brain Sci. 2014, 4 53 Figure 1. A schematic illustration of the adaptive versus classical experimental design approaches. The classical approach (left) is characterized by a sequential ordering of the main experimental steps: experimental design specification occurs prior to data acquisition, which is followed by data analysis and hypothesis testing. In contrast, the adaptive approach (right) operates in real time and proceeds with design optimization, data acquisition and analysis at each experimental stage or trial. The online approach enables hypothesis testing to be optimized at the individual level by adapting the experimental design on the basis of past observations. This is the general principle of adaptive design optimization (ADO), which can be extended to advanced computational models of electrophysiological responses thanks to brain-computer interface (BCI) technology, with the aim of optimizing experimental conclusions and the time-to-conclusion in cognitive and clinical neuroscience. 2. Theory and Methods 2.1. Dynamic Causal Models (DCMs) In this section, we briefly introduce the very general type of complex generative models for which the proposed ADO procedure is most appropriate. In their general form, such models are defined by a pair of assumptions {f, g}. The first component, f, is the evolution function, which prescribes the evolution or motion of hidden (unobservable) neuronal or psychological states x, such that: x = f(x, θ, u) (1) The second component, g, is the observation function and prescribes the mapping from hidden states to observed neurophysiological, metabolic or behavioral responses, such that: y = g(x, φ, u) + ε (2) Brain Sci. 2014, 4 54 θ and φ are the model parameters. They represent fixed, but unknown, values that parameterize the evolution and observation functions, respectively. These values might differ from one subject to another or, for the same subject, from one experimental condition to the next. ε indicates random fluctuations or noise that corrupt the observed data. Finally, u corresponds to experimental control variables, that is, exogenous inputs to the system that might encode changes in experimental condition (e.g., visual stimulation-type, like face vs. house) or the context under which the responses are observed (e.g., sleep vs. awake). Instantiations of such models have been proposed to explain the generation and the effect of experimental modulations in fMRI data [27] and various electrophysiological features in EEG, MEG or intracranial (i.e., local field potentials (LFP)) data, such as evoked [33], induced [34] or steady-state responses [35]. More recently, a related dynamical-system based approach has been derived to model psychological states, their evolution over time and their mapping onto observable behavioral measures (e.g., choices, reaction times) [28] or physiological observations [36]. However, referred to as “observing the observer”, this approach differs from the above classical DCMs, because it involves the embedding of a subject’s (the observer) dynamic causal model of the environment (M = {f , g }) into an s s s experimenter’s (another observer observing the subject) dynamic causal model of the subject (M = {f , g }). Further, assuming that the subject implements an optimal online Bayes inference e e e (see Appendix A1) to invert the duplet {f , g } and infer the hidden states of the environment, the s s evolution (perception) function, f , incorporates this inference and learning process, while the e observation (response) function, g , defines the mapping between the hidden subject’s internal states e (the inferred or posterior estimates of the environment hidden states) onto behavioral or physiological responses. Bayesian inference applies to the experimenter’s model in order to compare pairs of models {M , M }and infer those model parameters (see Appendix A1). This is why this approach is also s e referred to as a meta-Bayesian approach [28]. Importantly, in this context, we explicitly model the link between the precise sequence of presented sensory inputs and the evolving subject’s beliefs about the state of the world. 2.2. Online Optimization of Model Comparison Most of the generative models that are used in cognitive neuroscience fall into the class of nonlinear Gaussian models. Our approach combines two recent methodological advances and brings them online for ADO. First, we use a Bayesian framework to invert and compare such generative models [28] (see Appendix A1). Second, we use a previously proposed proxy to the model selection error rate [32] as a metric to be optimized online through the appropriate selection of experimental control variables (see Appendix A2). Under the Laplace approximation [37], this metric (the Chernoff bound) takes a computationally efficient analytic form, which is referred to as the Laplace–Chernoff bound. In [32], the authors disclosed the relationship between the Laplace–Chernoff bound and classical design efficiency criteria. They also empirically validated its usefulness offline, in a network identification fMRI study, showing that deciding whether there is a feedback connection between two brain regions requires shorter epoch durations, relative to asking whether there is experimentally-induced change in a connection that is known to be present. Brain Sci. 2014, 4 55 For the online use of the same criterion in order to optimize the experimental design for model comparison, at the individual level, we simply proceed as illustrated in Figure 1 in the adaptive scenario. At each trial or experimental stage, it consists of: (i) Running the variational Bayes (VB) inference for each model, M, given past observations and experimental design variables; (ii) Updating the prior over models with the obtained posteriors; (iii) Computing the design efficiency or Laplace-Chernoff bound for each possible value of the experimental design variable, u; (iv) Selecting the optimal design for the next trial or stage. Finally, the online experiment will be interrupted as soon as some stopping criterion will have been met. Typically, the experiment will be conclusive as soon as one model is identified as the best model, for instance, when its posterior probability will be greater than 0.95. If this is not the case, when an a priori fixed number of trials would have been reached, the experiment will be considered as inconclusive in selecting a single best model for the given subject. 2.3. Validation We now turn to the validation of the proposed approach. We describe two studies based on synthetic data. The first one demonstrates the face and construct validity of the approach by reproducing the simulation example in [20]. The second study illustrates how our approach extends to a realistic online scenario, whose aim is to compare more than two nonlinear models of perceptual learning based on electrophysiological responses only. 2.3.1. First Study: Synthetic Behavioral Data In order to illustrate our approach for ADO and to provide a first demonstration of its face and construct validity, we reproduce results from Cavagnaro and colleagues [20,38]. These authors showed how an optimal design might look in practice, considering the example of a typical behavioral experiment designed to discriminate psychological models of retention (i.e., forgetting). The experiment consists of a “study phase”, in which participants are given a list of words to memorize, followed by a time interval (lag time), followed by a “test phase”, in which retention is assessed by testing how many words the participant can correctly recall from the study list. The percentage of words recalled correctly typically decreases with the time interval. A model of retention is the function that can fit this relationship between retention and lag time. These authors considered two retention models: power and exponential forgetting [38]. Model power (POW): (3) −𝑏 Model exponential (EXP): 𝑝 = 𝑎(𝑡+1) (4) −𝑏𝑡 In each equation, the symbol, p, denotes th𝑝e =pre𝑎d𝑒icted probability of correct recall as a function of lag time, t, between the study and test phase, with model parameters a and b. Brain Sci. 2014, 4 56 As in [38], we simulated data under the (true) model POW, considering plausible values for model parameters. Note that the retention interval or lag time is the design variable whose value is being experimentally manipulated. For a given lag time, t, each model predicts the number of correctly recalled items: (5) −𝑏 where 30 is the number of presented i𝑦tem=s𝑛 a.t𝑎 e.a(c𝑡h+ tr1ia)l. The observable data, , in this memory retention model formally follows a binomial distribution 𝑛 = and (conjugate) Beta priors on parameters (a,b) are usually used. In our case, we used a normal 𝑦 approximation to the priors on parameters (a,b). As increases, according to the central limit theorem, the binomial distribution tends to a normal density with matched moments, and a normal 𝑛 approximation to the likelihood function is appropriate. We simulated the responses from 30 participants, by drawing 30 pairs of parameter values and , considering a ~ Ɲ(0.8,0.5) and b ~ Ɲ(0.4,0.5). For each simulated participant, ADO was initialized with the same priors over model parameters: 𝑎 𝑏 a ~ Ɲ(0.75,2), b ~ Ɲ(0.85,2) for POW and a ~ Ɲ(0.9,2), b ~ Ɲ(0.15,2) for EXP; and the same prior for each model: p(POW) = p(EXP) = 1/2. Similar to what Cavagnaro and colleagues did, we compared ADO against two classical (non-adaptive) experimental designs. The first one, called “Random Design”, is a complete random fashion design, where the lag time at each trial was chosen randomly between 0 and 100 s. The second one, called “Fixed 10 pt Design”, presents, in a random order, each lag time from a fixed set of lag times concentrated near zero and spaced roughly geometrically: 0, 1, 2, 4, 7, 12, 21, 35, 59 and 99 s. The latter design is closer to the set of lag times used in real retention experiments [39]. We considered 10 trial-long experiments and computed the true (POW) model posterior after each trial, for each design. Only ADO is adaptive in the sense that, at each trial, the most efficient lag time is selected based on the updated posteriors over parameters and models, and the ensuing Laplace-Chernoff bound for each possible lag times. The results are presented in Section 3.1. 2.3.2. Second Study: Synthetic Electrophysiological Data To demonstrate how our new instantiation of ADO extends to nonlinear dynamic causal models, which are of increasing interest in cognitive neuroscience, we now turn to a second series of original simulations. We therefore consider recent models of human perceptual learning in a changing environment [40–43] and combine them with recent works on how these models might predict single-trial EEG evoked responses [36,44]. These models can be thought of as a specific instantiation of the Bayesian brain and predictive coding hypotheses [45]. The former hypothesis postulates that the brain uses Bayesian inference for perception and perceptual learning. In other words, these processes rely upon an internal generative model, i.e., probabilistic assumptions of how external states cause changes in sensory data (the sensory signal likelihood) and prior beliefs about these causes [46]. In addition, the predictive coding hypothesis [47] suggests that electrophysiological activity that propagates through neural networks encodes prediction (top-down) and prediction error (bottom-up) messages, whose role is to explain away sensory surprise by updating beliefs about hierarchically deployed hidden causes. Evoked electrophysiological responses that are reminiscent of such Brain Sci. 2014, 4 57 mechanisms were first established using so-called “oddball” experimental paradigms, where one category of rare stimuli (deviants) is intermixed with a second category of frequent stimuli (standards). The ensuing “mismatch negativity” (MMN) EEG evoked potential is then interpreted in terms of the response of the system to a violation of its prior expectations [48]. These responses have been observed in various sensory modalities, but are mostly documented in the auditory [49] and somatosensory domains [44]. Below, we expose the perceptual (evolution) and response (observation) models we considered for simulating MMN-like responses. 2.3.2.1. Perceptual Learning Model We considered a simplified version of the perceptual learning model proposed in [43] to model perception in a volatile environment (see also [50]). This perceptual model (Figure 2) comprises a hierarchy of 3 hidden states (denoted by x), with States 2 and 3 evolving in time as Gaussian random walks. The probability of a stimulation category appearing in a given trial (t) (represented by State , with = 1 for deviant and = 0 for standard stimuli) is governed by a state, , at the next level (𝑡) of the hierarchy. The brain perceptual model assumes that the probability distribution of is 𝑥1 𝑥1 𝑥1 𝑥2 conditional on , as follows: 𝑥1 𝑥2 (6) 1−𝑥1 𝑥1 where is a sigmo𝑝i(d𝑥 (1s|o𝑥f2tm) =ax)𝑠 (fu𝑥n2)ctio�n1: −𝑠(𝑥2)� = 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 �𝑥1;𝑠(𝑥2)� 𝑠(∙) (7) 1 𝑠(𝑥) ≝ Equations (6) and (7) imply that the states =1 0+ a𝑒n𝑥d𝑝 (− =𝑥 1) are equally probable when = 0. The probability of itself changes over time (trials) as a Gaussian random walk, so that the 𝑥1 𝑥1 𝑥2 value, , is normally distributed with mean and variance : 𝑥2 𝑡 (𝑡) (𝑡−1) 𝜅𝑥3+𝜔 𝑥2 𝑥2 𝑒 (8) (𝑡) (𝑡−1) (𝑡) (𝑡) (𝑡−1) (𝑡) Setting the paramete𝑝r �𝑥 2to �0𝑥 2effec,ti𝑥v3ely� m=e a𝑁ns� 𝑥a2ssu; m𝑥i2ng th,𝑒at𝑥 𝑝th�e𝜅 v𝑥o3lat+ilit𝜔y� o�f is fixed over time. In all other cases, the magnitude of changes in over time (trials) is controlled by (the third level 𝜅 𝑥2 of the hierarchy) and , which can be regarded as a base (log-) volatility. The state, , on a given 𝑥2 𝑥3 (𝑡) trial is normally distributed around , with a variance determined by the constant parameter, . 𝜔 𝑥3 (𝑡−1) The latter effectively controls the variability of the log-volatility over time. 𝑥3 𝜗 (9) (𝑡) (𝑡−1) (𝑡) (𝑡−1) 𝑝�𝑥3 �𝑥3 ,𝜗� = 𝑁�𝑥3 ; 𝑥3 ,𝜗� Brain Sci. 2014, 4 58 Figure 2. Graphical illustration of the hierarchical perceptual (generative) model with States x , x and x . The probability at each level is determined by the variables and 1 2 3 parameters at the level above. Each level relates to the level below by controlling the variance of its transition probability. The highest level in this hierarchy is a constant parameter, ϑ. At the first level, x determines the probability of the input stimulus: 1 standard (0) or deviant (1). The model parameters, ω and ϑ, control the agent’s belief update about State . Note that setting κ = 0 effectively truncates the hierarchy to the first two levels. In the diagram, squares represent fixed parameters, while circles represent state 𝑥 variables that evolve in time. 2.3.2.2. Electrophysiological Response Model One can quantify the novelty of sensory input using Bayesian surprise. In what follows, we assume that EEG response magnitudes encode the Bayesian surprise induced by the observation of sensory stimuli at each trial. This is in line with recent empirical studies of the MMN in oddball paradigms [36,44]. Recall that, at any given trial, the Bayesian surprise is simply the Kullback-Leibler divergence between the prior and posterior distribution [51]. It indexes the amount of information provided by sensory signals at each level of the hierarchy. We simulated trial-by-trial EEG response magnitudes by adding random noise to the (weighted) Bayesian surprise (BS) at the second level of the perceptual learning model: (10) (𝑡) (𝑡) 𝑦𝑡 = ℎ∗𝐵𝑆�𝑝�𝑥2 �𝑢𝑡� ,𝑝�𝑥2 ��+𝜖 Note that under the Laplace approximation, 𝜖B~SƝ h(a0s, 𝜎a )straightforward analytic form (see [52]). In the current simulations, we fixed the weight parameter, h, to −10 and the noise precision or inverse variance to 100. We considered the problem of comparing five different perceptual models given simulated EEG data (see Table 1). M1 is a “null” model with no learning capacities. The four other models form a 2 × 2 factorial model space. Contrary to M4 and M5, M2 and M3 have no third level (κ = 0). They are unable to track the volatility of the environment. Orthogonal to this dimension is the base learning rate