ebook img

Basic Methods for Theoretical Biology - Vrije Universiteit Amsterdam PDF

77 Pages·1985·0.75 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Basic Methods for Theoretical Biology - Vrije Universiteit Amsterdam

Basic methods in Theoretical Biology S.A.L.M. Kooijman Dept. Theoretical Biology Faculty of Earth and Life Science, Vrije Universiteit, Amsterdam This document is part of the DEB tele-course http://www.bio.vu.nl/thb/deb/course/ Last update: 2011/01/07 Basic methods in Theoretical Biologydiscussesabasictoolkitwhichgraduate students in Quantitative Biology, and especially in Theoretical Biology, should be able to use. The mathematical material is presented like an extended glossary; applications in biology are given in examples and exercises. Ackowledgements I like to thank J. Ferreira, B.W. Kooi and C. Zonneveld for their helpful comments Summary of contents: 1 METHODOLOGY 2 MATHEMATICAL TOOLKIT 3 MODELS FOR PROCESSES 4 MODEL-BASED STATISTICS Accompanying: EXAMPLES From biological problem via mathematics to solution EXERCISES Motivations, given, questions, hints, answers Contents 1 Methodology 1 1.1 Empirical cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 Conservation laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 Scales in organization . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Numerical behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5.1 Testability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.6 Experimental design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Identification of variables to be measured . . . . . . . . . . . . . . . . . . . 10 1.8 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.9 Realism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.9.1 Stochastic versus deterministic models . . . . . . . . . . . . . . . . 12 1.10 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.10.1 Propositional logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.10.2 Predicate logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2 Mathematical toolkit 17 2.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.1 Combinatorics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.2 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Trigonometric functions . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.3 Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.4 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4.6 Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.5 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.5.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.2 Ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 iii iv CONTENTS 2.5.3 Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.5.4 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 31 2.5.5 Quadratic and bilinear forms . . . . . . . . . . . . . . . . . . . . . . 32 2.5.6 Vector calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.6 Random variables and probabilities . . . . . . . . . . . . . . . . . . . . . . 33 2.6.1 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.6.2 Examples of probability distributions . . . . . . . . . . . . . . . . . 35 2.6.3 Examples of probability density functions . . . . . . . . . . . . . . . 36 2.6.4 Conditional and marginal probabilities . . . . . . . . . . . . . . . . 37 2.6.5 Calculations with random variables . . . . . . . . . . . . . . . . . . 37 2.7 Numerical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.1 Numerical integration . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.2 Numerical differentiation . . . . . . . . . . . . . . . . . . . . . . . . 38 2.7.3 Root finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7.4 Extreme finding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3 Models for processes 41 3.1 Types of processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2.1 Constraints on dynamics . . . . . . . . . . . . . . . . . . . . . . . . 44 3.2.2 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2.3 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Model-based statistics 49 4.1 Scope of statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2 Measurements: scales and units . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Precision and accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.4 Smoothing and interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.5 Testing hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.6 Likelihood functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.7 Large sample properties of ML estimators . . . . . . . . . . . . . . . . . . 57 4.8 Likelihood ratio principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.8.1 Likelihood based confidence region . . . . . . . . . . . . . . . . . . 59 4.8.2 Profile likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.9 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.9.1 Constant variation coefficient . . . . . . . . . . . . . . . . . . . . . 61 4.10 Composite likelihoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.11 Parameter identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.12 Monte Carlo techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 Notation 65 Bibliography 67 Preface The field of theoretical biology uses elements from methodology, mathematics, and com- puter science to develop new insights in biology. Such developments also require elements from biology, physics, chemistry and earth sciences. Practice learns us that this cocktail is hard to teach in a single course; it is simply too overwhelming. An extra handicap is that little knowledge of mathematics is less than adequate to deal with the complex non-linearities of life. Physics got its strength from simplification, both in theory and in experimental design. Biology, however, has little access to this powerful approach; the most simple living systems are still very complex. This is why biology still resembles a big bag of facts, semi-facts and artifacts. Yet, many theoretical biologists believe that it need not to stay like this. The purpose of this document is to present an adequate formal toolkit, that should suffice for most applications in biology. Although many books exist on each of the topics we discuss, we found no book that just fills our educational needs. Mathematical books (and especially those on statistics) frequently have quite some material that is of little interest for the applications we have in mind, as well as that they do not present some more advanced material we think to be essential for a basic toolkit. Mathematicians use mathematics different from natural scientists, and have other purposes in mind. We here only focus on basic material for graduate students. You will find little about linear models and techniques, that dominate standard texts. The reason is that linearity hardly occurs in biology. We also omitted some standard material about computations of quantities like ranks, determinants, inverses etc, because basic computer routines are available and we do have a need for selection. You will find more on multivariate models than in elementary texts. We do realize that research frequently requires more than we offer, but the presented material should allow a rapid consultation of specialized literature. We focus on conceptual aspects, and did not attempt to write a “stand alone” docu- ment. The serious student frequently will feel the need to consult elementary textbooks that offer more backgrounds, derivations and contexts. We suggest titles we think to be appropriate. We assume practical knowledge about Octave and/or Matlab and the avail- ability of software package DEBtool, which can be downloaded from the electronic DEB laboratory at http://www.bio.vu.nl/thb/deb/deblab v vi Preface Design of the document on methods Arathertechnicaldocumentexplainselementsofmethodology, mathematicsandcomputer science. The three disciplines start to blur and cross fertilize each other in chapters on modeling and statistics. The first part on methods has little material that is specific for any specialization in biology; it, therefore, remains somewhat abstract. The plan is to keep the document brief, not a collection of all-there-is, but a choice for the most basic methods, with an emphasis on concepts. The selection criterion of material is its use in the applications and exercises. Applications The second part of the document consists of an eventually large number of examples of application in all fields of biology. The plan is to keep each example short, starting with a biological problem and its motivation, and coming back with an answer to that problem, using pieces if the toolkit that is offered in the first part. If new examples are included that use methods that are not discussed in the first part, the first part will be extended to include these methods. Exercises Besides the method document and applications data-base, an eventually substantial col- lection of exercises and answers is set up. Each exercise has the structure: motivation, given, question, hints, answer. The exercises can illustrate the particular method and/or an application. They can make use of public domain software. Octave and DEBtool, which is written in Octave and Matlab, can be used to make the exercises. It also possible to use packages like Maxima (for symbolic calculations) and AUTO for bifurcation analysis, for instance. Use of the document The general idea is that a graduate student, who is trained in a particular specialization in biology, can be offered a number of examples and exercises in his/her own field, together with the methods part of the document, to gain an working knowledge of Theoretical Biology. Self improving document We do have the idealistic view that universities have the task of optimizing the propagation of knowledge in a way that is as free of financial and cultural constraints as possible. We also do believe that collaboration leads to improvement. This is why we have setup this eduction project in several phases. Preface vii Phase 1: design The executing editor first writes a first draft to structure the whole project. This is more efficient than listing the plans in detail. Phase 2: polishing The editorial board polishes the material, and supply additions (especially of exercises and examples). Phase 3: self improvement The project is now open for contributions of examples and exercises from all over the world. The editorial board will function like that of a journal and judge incoming material, seeking advice from referees. If the material requires new methods, the author should make a proposal for such a text. The first part of the document on methods will remain the responsibility of the editorial board (for the time being). If material from submissions is used in this first part, authors’ name will be mentioned as contributor. The authors’ name will remain associated with examples and exercises. If the collection of examples and exercises grows large, it will be classified according to the biological subject. Any suggestions for improvement are very welcome; please mail to [email protected]. Where to get? This document is freely down-loadable from http://www.bio.vu.nl/thb/course/tb/ and will be updated and improved continuously in educational practice. We hope to stimulate the interest in the field of Theoretical Biology in this way, and to help teach a generation of students that will bring the field into blossom. Disclaimer This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Questions? Send to [email protected]. Chapter 1 Methodology 1.1 Empirical cycle Like it or not, but humans think in terms of models, although not everybody realizes that. ThemostimportanttaskofTheoreticalBiologyistomakeimplicitassumptionsexplicit, so that they can be replaced by others if necessary. Models have a lot in common with stories about quantities, phrased in the language of mathematics; they can have (and frequently do have) language errors, they can tell non-sense and they can be boring. They can also be exciting, however, depending on they way they are put together. After identification of the scientific problem, the empirical cycle should start with the formulation of a set of assumptions, a derivation of a mathematical model from these assumptions, a sequence of tests on consistency, coherence, parameter sensitivity, and relevance with respect to the problem. See Figure 1.1. Most models don’t need to be tested against experimental data; they simply do not pass the theoretical tests. The second part of the empirical cycle then consists of the formulation of auxiliary theory for how variables in the model relate to things that can be measured, the setup of adequate experiments and/or sampling and measurement protocols to test model predic- tions, the collection of the measurements, and statistical tests of model predictions against measurements. These tests could reveal that the protocols have been less adequate, and should be redesigned and executed; possible inadequacies should be detected in the auxil- iary theory. So inconsistencies between data and model predictions not necessarily point to inadequacies in the model itself. If anywhere in this two-segment cycle appears the need to improve the model, it should not be changed directly, but the list of assumptions should be adapted, and the whole process should be repeated. It is a long and painstaking process, but sloppy procedures easily lead to useless results. Advocates of putting the lead of the empirical cycle in the observations, rather than in the assumptions, are frequently unaware of the implicit assumptions that need to be made to give observations a meaning. The most important aspect of modeling is to make all assumptions explicit. If modeling procedures are followed in a sloppy way, by adapting models to fit data directly, it is likely that the result will be sloppy too; one easily falls in the trap of curve-fitting. If it comes to fitting curves to data, the use of a pencil, rather than a model, is so much easier. 1 2 CHAPTER 1. METHODOLOGY Empirical cycle test model for consistency (re) formulate with respect to experimental mechanistically inspired assumptions (hypotheses) results (statistical analysis) collect derive perform math model observations identify variables and experiment & (from literature) parameters (re) identify measurements test scientific problem model for & aim of research consistency (dimensions, formulate conservation laws) relationship between test model for coherence measured & model variables with related fields of interest identify variables test model for efficiency to be measured with respect to aim of research (type, accuracy, frequency) test (re) design model for numerical experiment behaviour & qualitative realism to test model assumptions (plasticity, simplicity) (factors to be manipulated) Figure 1.1: The empirical cycle as conceived by a theoretician. In the knowledge that nonsense models can easily fit any given set of data well, given enough flexibility in the parameters, realism is not the first and not the most important criterion for useful models. Lack of fit (so lack of realism) just indicates that the modeling job is not completed yet. This discrepancy between prediction and observation can be used to guide further research, which is perhaps the most useful application of models. This application to improve understanding only works if the model meets the criteria indicated in the figure; few models meet these criteria, however. 1.2. CONSISTENCY 3 It is common practice, unfortunately, to just pose and apply a model, with little at- tention for the underlying assumptions. If such a model fails one of the tests, nothing is left and one should start again from scratch. There cannot be a sequence of stepwise improvements in understanding and prediction. The fact that such a model fits data is of little use, perhaps only for interpolation purposes. Models are idealizations and, therefore, always ‘false’ in the strict sense of the word. This limits the applicability of the principle of falsification. A model can fit data for the wrong reasons, which means that the principle of verification is even more limited in applicability. This points to the criterion usefulness, to judge models, but usefulness is linked to a purpose. This is why a model should never be separated from its purpose. The purpose can contain elements such as increase in understanding, or in predictability. Increase in understanding can turn a useful model into a less useful one. If a model passes all tests, including that against experimental data, there is no reason to change the assumptions, and work with them until new evidence forces reconsideration. It might seem counter intuitive, but models that fail the test against experimental data moredirectlyservetheirtaskinleadingtogreaterinsight, i.e.inguidingtotheassumptions that require reconsideration. This obviously only works well if the steps of the formulation of assumptions has been adequate. Models are a mean in getting more insight, never an aim in themselves. Theoretical biology specializes in the interplay between methodology, mathematics and computer science as applied in biological research. It is by its nature an interdisciplinary specialization in generalism and the natural companion of experimental biology. Both have complementary roles to play in the empirical cycle. We hope that Figure 1.1 makes clear that both specializations should be considered as obligate symbionts in the art of science. People with a distaste for models frequently state that ‘a model is not more than you put into it’. This is absolutely right, but instead of being a weakness, it is the single most important aspect of the use of models. Assumptions can have far reaching consequences that cannot be revealed without the use of mathematics. Put into other words: any mathematical statement is either wrong or follows from assumptions. Few people throw mathematics away for this reason. Models play an important role in the mechanism of research, as will be discussed, but also in other contexts, such as in finding answers to “what if” questions, and in solving extrapolation problems (see chapter on statistics). The next sections highlight some steps in the empirical cycle. Table 1.1 gives some practical hints. 1.2 Consistency Proposition X is inconsistent with proposition Y, means that they cannot both be true. Models that are internally not consistent are meaningless, so they are useless. If different assumptions are directly contradictory, inconsistency is easy to detect. In many cases, however, this is much less easy. Inconsistencies come in many forms; lack of realism (meaning: a difference between measured data and model predictions for those data) is just one form (that comes in gradations).

Description:
Jan 7, 2011 Basic methods in Theoretical Biology discusses a basic toolkit which graduate students in Quantitative Biology, and especially in Theoretical
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.