From nicolas.brodu at numerimoire.net Thu Apr 2 08:11:59 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Thu, 2 Apr 2009 10:11:59 +0200 Subject: [Causality-ML] Reminder: Today's Talk: Detecting the arrow of time with independent noise models Message-ID: <200904021011.59881.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, The next presentation is TODAY by Jonas Peters. He will present us how to detect the arrow of time with independent noise models When: Thursday 2nd April 2009, Paris 17h, ET 11h, PT 8h, Tokyo 0h(Friday) URL: http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090402JP Phone number: +1 (218) 936-7999 Participant code: 665140# See below for tips how to access the conference on the Internet using voice-over-IP. Abstract of today's presentation: -------- We propose a method that detects the true direction of time series by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identifiable). An implication of our result is that it confirms a fundamental causal reasoning - that noise is independent of signal when the true direction of the model is recovered - in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. -------- If you know of potentially interested speakers or if you wish to present a paper, please send us a message so we can add you in the planning. The planning for the next presentation is maintained at: http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture Tip for a cheap voice-over-IP connection to the conference: Open an account on nonoh.net (server in Germany) or voipcheap.com (server in US) or another SIP provider near your country. Pay 10 euros for 3 or 4 month unlimited access to nearly all land line phone numbers in the world, including the teleconference number. This means you will be able to spend as much time as you want for free on long-distance and international calls with the basic fee. Use a microphone + headset on your computer, and a program like Twinkle (Linux) or Ekiga (multi-OS) or X-Lite (windows). Set it up with your nonoh.net or voipcheap account (see the help page on these sites). Beware of filtering firewalls. Then call our teleconference number from your computer. Note: We are not affiliated to any of these groups. We provide this information in the hope it is useful. Best regards, Nicolas Brodu From janzing at ira.uka.de Fri Apr 3 07:30:44 2009 From: janzing at ira.uka.de (Dominik Janzing) Date: Fri, 03 Apr 2009 09:30:44 +0200 Subject: [Causality-ML] Reminder: Today's Talk: Detecting the arrow of time with independent noise models In-Reply-To: <200904021011.59881.nicolas.brodu@numerimoire.net> References: <200904021011.59881.nicolas.brodu@numerimoire.net> Message-ID: <49D5BB24.6020102@ira.uka.de> Hi Nicolas, regarding your interesting question yesterday: I have some unpublished calculations where I represent AR-models as the linear reversible dynamics of a physical toy system that interacts with its environment. The environment consists of an infinite collection of particles that start in a statistically independent state. Within this model, the following statements hold: 1) The Gaussian AR-process can be embedded such that the dynamics of the *joint* system is stationary. 2) Every linear reversible embeddings of *non-Gaussian* processes generate statistical dependences between the particles in the environment. Even though the joint dynamics is reversible by assumption, we have *phenomenological* entropy production because increasing dependences look like entropy increase from a coarse-grained perspective. The message: 1) The linearity of the total dynamics is inherited to the forward-time conditional of the subsystem, but the backward time conditional is not a linear model. 2) Since the joint dynamics is reversible, the only asymmetry between past and future consists in the fact that I assume the particles to be *initially* in a product state instead of *finally*. More generally speaking, this suggests: - if the joint dynamics of a system and its environment is described by simple equations, the simplicity of the dynamics is usually inherited to the forward time conditionals of the subsystems, but not to the backward conditionals.The reason of this asymmetry is that the joint dynamics generate dependences rather than decreasing dependences. Hence, the asymmetries between cause and effect we are searching for are closely related to the thermodynamical arrow of time. As soon as my draft is readable, I can send you the calculations. It is a bit unfortunate that the audience of our reading group is so small. By recruiting new speakers, I was also hoping to get more people aware of it... Best Regards, Dominik From david.dowe at infotech.monash.edu.au Fri Apr 3 09:48:06 2009 From: david.dowe at infotech.monash.edu.au (David Dowe) Date: Fri, 03 Apr 2009 20:48:06 +1100 Subject: [Causality-ML] Detecting the arrow of time with independent noise models In-Reply-To: <49D5BB24.6020102@ira.uka.de> References: <200904021011.59881.nicolas.brodu@numerimoire.net> <49D5BB24.6020102@ira.uka.de> Message-ID: <49D5DB56.1010606@infotech.monash.edu.au> Dear Dominik, Hi. Am not sure whether this is a ``little'' point [whereupon I hope that all will forgive and excuse my indulgence of questionable relevance on one of my hobby horses] or a ``non-little'' point [in the event of which I hope that any disagreement you might have is expressed gently and politely :-) ]. Re your comment that ``the asymmetries between cause and effect we are searching for are closely related to the thermodynamical arrow of time'', Chris Wallace argues in Chapter 8 (``The Feathers on the Arrow of Time'') of his (2005) book (``Statistical and Inductive Inference by Minimum Message Length'') [www.csse.monash.edu.au/~dld/CSWallacePublications#MMLBook] that entropy is not the arrow of time. In (very) short, he appeals to the notion in physics of CPT-invariance. He also gives (statistical) theory and does simulations to support his case. Wallace was no slouch. (In the remainder of this paragraph, I give evidence as to why Wallace was no slouch, so please feel welcome to skip if this point is not of great interest.) He started out as a physicist, he was a Bayesian in the 1950s (well before it became more fashionable), he developed the Wallace multiplier in 1964, he developed Minimum Message Length (MML) in 1968, ... and ..., and his feats are summarised in the Computer Journal (Oxford University Press)'s Sept. 2008 (Vol. 51 No. 5) Christopher Stewart WALLACE (1933-2004) memorial special issue (http://comjnl.oxfordjournals.org/content/vol51/issue5), with a broad survey in my ``Foreword re C. S. Wallace'' as guest editor (http://dx.doi.org/10.1093/comjnl/bxm117). So, if you're open to the possibility of - as per Wallace's argument - entropy not being the arrow of time and there being no thermodynamical arrow of time (and I admit that, at the time of writing, Wallace's view does not [yet?] seem to be widely embraced), I invite you at your non-urgent leisure (and if you think the issue is sufficiently important) to possibly modify your wording around that point. I'm not saying that there isn't data that lends itself to having arrows of time in it [or, at least, I don't think I'm saying that], but I am (following Wallace in) quibbling about entropy being a thermodynamical arrow of time. Cheers and best, David Dowe. www.csse.monash.edu.au/~dld/CSWallacePublications#MMLBook (Chapter 8). Dominik Janzing wrote: > Hi Nicolas, > > regarding your interesting question yesterday: > I have some unpublished calculations where I represent AR-models as the > linear reversible dynamics of a physical toy system that interacts with > its environment. The environment consists of an infinite collection of > particles that start in a statistically independent state. > Within this model, the following statements hold: > > 1) The Gaussian AR-process can be embedded such that the dynamics of the > *joint* system is stationary. > > 2) Every linear reversible embeddings of *non-Gaussian* processes > generate statistical dependences between the particles in the > environment. Even though the joint dynamics is reversible by assumption, > we have *phenomenological* entropy production because increasing > dependences look like entropy increase from a coarse-grained perspective. > > The message: > > 1) The linearity of the total dynamics is inherited to the > forward-time conditional of the subsystem, but the backward time > conditional is not a linear model. > > 2) Since the joint dynamics is reversible, the only asymmetry between > past and future consists in the fact that I assume the particles to be > *initially* in a product state instead of *finally*. > > More generally speaking, this suggests: > > - if the joint dynamics of a system and its environment is described by > simple equations, the simplicity of the dynamics is usually inherited > to the forward time conditionals of the subsystems, but not to the > backward conditionals.The reason of this asymmetry is that the joint > dynamics generate dependences rather than decreasing dependences. > Hence, the asymmetries between cause and effect we are searching for are > closely related to the thermodynamical arrow of time. > > As soon as my draft is readable, I can send you the calculations. > > It is a bit unfortunate that the audience of our reading group is so > small. By recruiting new speakers, I was also hoping to get more people > aware of it... > > Best Regards, > Dominik > > > > > > > > > > > > > > > > > _______________________________________________ > Causality and Machine Learning > Mailing-list subscription: > http://mail.encours.org/listinfo/causality-ml > Material and presentation schedule: > http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture > From nicolas.brodu at numerimoire.net Sun Apr 5 11:58:46 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Sun, 5 Apr 2009 13:58:46 +0200 Subject: [Causality-ML] Video replays for the presentations of Mehreen Saeed and Jonas Peters Message-ID: <200904051358.46124.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, The following videos are now available online and for download: - Mehreen Saeed: "The Use of Bernoulli Mixture Models for Identifying Corners of a Hypercube and Extracting Boolean Rules From Data" http://www.encours.org/causality/BernoulliMixtures/replay.html - Jonas Peters: "Detecting the arrow of time with independent noise models" http://www.encours.org/causality/arrow_time_indep_noise/replay.html Due to technical difficulties it can sometimes be time consuming to make the replays. Please try to attend the talks directly if you can. I will continue making the videos whenever this is possible, possibly after some delay, as they make valuable resources that many of you download after the presentation. Thanks for your interest, Nicolas Brodu From nicolas.brodu at numerimoire.net Sun Apr 5 12:15:16 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Sun, 5 Apr 2009 14:15:16 +0200 Subject: [Causality-ML] Detecting the arrow of time with independent noise models In-Reply-To: <49D5DB56.1010606@infotech.monash.edu.au> References: <200904021011.59881.nicolas.brodu@numerimoire.net> <49D5BB24.6020102@ira.uka.de> <49D5DB56.1010606@infotech.monash.edu.au> Message-ID: <200904051415.16397.nicolas.brodu@numerimoire.net> Dear Dominik and David, Thanks for continuing to debate the presentation on the list. More generally I wish people would discuss the talks before and after the presentations. For the mailing list recipients that could not participate, my question was: Slide 16: In the ARMA model, all information during the evolution of the system is brought by noise. If all noise is Gaussian then no information is brought (max entropy). In the case of non-Gaussian noise you add information in the system as time passes, and it is not time-reversible. What do you think of that interpretation?? The video of the presentation is also now available online. Best, Nicolas From nicolas.brodu at numerimoire.net Wed Apr 8 13:32:47 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Wed, 8 Apr 2009 15:32:47 +0200 Subject: [Causality-ML] Next talk announcement: Learning Real-World Causal Models from Time-Series Data Message-ID: <200904081532.47708.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, The next presentation will be on NEXT THURSDAY (16th April) by Denver Dash. He will present us with Difference-Based Causal Models (DBCMs), how they compare to other causal models and how they are motivated by real-world dynamical systems. When: Thursday 16th April 2009, Paris 17h, ET 11h, PT 8h, Tokyo 0h(Friday) URL: http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090416DD Phone number: +1 (218) 936-7999 Participant code: 665140# See below for tips how to access the conference on the Internet using voice-over-IP. Abstract of the presentation: -------- There are many formalisms for learning dynamic statistical models that admit a causal interpretation. ARMA models allow for structured causation across time and allow specific parameterization of hidden model averaging variables. Dynamic Bayesian networks are more general, allowing acyclic instantaneous causation and inter-temporal causation. Structural equation models are even more general, in principle allowing instantaneous causal cycles as well. While these formalisms are quite powerful, we argue that they all in fact are too general for a large class of real-world mechanical systems. In this talk, I present Difference-Based Causal Models (DBCMs), a representation that is closely related to one proposed by Iwasaki and Simon in 1994, and is motivated by a ubiquitous class of real mechanical systems. This representation is a discrete-time dynamic model that assumes all causation across time is due to instantaneous causation of variables' derivatives. I motivate why this is the right representation for real-world mechanical systems (and maybe all causal systems), and I show how the presence of multiple dynamic processes and equilibrium points in these systems make other representations inappropriate for reasoning about manipulations. Finally, I provide a provably correct constraint-based search algorithm for efficiently learning the structure of DBCMs from data. -------- If you know of potentially interested speakers or if you wish to present a paper, please send us a message so we can add you in the planning. The planning for the next presentation is maintained at: http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture Tip for a cheap voice-over-IP connection to the conference: Open an account on nonoh.net (server in Germany) or voipcheap.com (server in US) or another SIP provider near your country. Pay 10 euros for 3 or 4 month unlimited access to nearly all land line phone numbers in the world, including the teleconference number. This means you will be able to spend as much time as you want for free on long-distance and international calls with the basic fee. Use a microphone + headset on your computer, and a program like Twinkle (Linux) or Ekiga (multi-OS) or X-Lite (windows). Set it up with your nonoh.net or voipcheap account (see the help page on these sites). Beware of filtering firewalls. Then call our teleconference number from your computer. Note: We are not affiliated to any of these groups. We provide this information in the hope it is useful. Best regards, Nicolas Brodu From nicolas.brodu at numerimoire.net Tue Apr 14 11:45:44 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Tue, 14 Apr 2009 13:45:44 +0200 Subject: [Causality-ML] Reminder: Learning Real-World Causal Models from Time-Series Data Message-ID: <200904141345.44694.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, This is a friendly reminder for this Thursday's (16th April) presentation by Denver Dash. He will present us with Difference-Based Causal Models (DBCMs), how they compare to other causal models and how they are motivated by real-world dynamical systems. When: Thursday 16th April 2009, Paris 17h, ET 11h, PT 8h, Tokyo 0h(Friday) URL: http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090416DD Phone number: +1 (218) 936-7999 Participant code: 665140# See below for tips how to access the conference on the Internet using voice-over-IP. Abstract of the presentation: -------- There are many formalisms for learning dynamic statistical models that admit a causal interpretation. ARMA models allow for structured causation across time and allow specific parameterization of hidden model averaging variables. Dynamic Bayesian networks are more general, allowing acyclic instantaneous causation and inter-temporal causation. Structural equation models are even more general, in principle allowing instantaneous causal cycles as well. While these formalisms are quite powerful, we argue that they all in fact are too general for a large class of real-world mechanical systems. In this talk, I present Difference-Based Causal Models (DBCMs), a representation that is closely related to one proposed by Iwasaki and Simon in 1994, and is motivated by a ubiquitous class of real mechanical systems. This representation is a discrete-time dynamic model that assumes all causation across time is due to instantaneous causation of variables' derivatives. I motivate why this is the right representation for real-world mechanical systems (and maybe all causal systems), and I show how the presence of multiple dynamic processes and equilibrium points in these systems make other representations inappropriate for reasoning about manipulations. Finally, I provide a provably correct constraint-based search algorithm for efficiently learning the structure of DBCMs from data. -------- If you know of potentially interested speakers or if you wish to present a paper, please send us a message so we can add you in the planning. The planning for the next presentation is maintained at: http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture Tip for a cheap voice-over-IP connection to the conference: Open an account on nonoh.net (server in Germany) or voipcheap.com (server in US) or another SIP provider near your country. Pay 10 euros for 3 or 4 month unlimited access to nearly all land line phone numbers in the world, including the teleconference number. This means you will be able to spend as much time as you want for free on long-distance and international calls with the basic fee. Use a microphone + headset on your computer, and a program like Twinkle (Linux) or Ekiga (multi-OS) or X-Lite (windows). Set it up with your nonoh.net or voipcheap account (see the help page on these sites). Beware of filtering firewalls. Then call our teleconference number from your computer. Note: We are not affiliated to any of these groups. We provide this information in the hope it is useful. Best regards, Nicolas Brodu From nicolas.brodu at numerimoire.net Fri Apr 17 20:05:59 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Fri, 17 Apr 2009 22:05:59 +0200 Subject: [Causality-ML] Next talk announcement: Beware of the DAG! Message-ID: <200904172205.59596.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, The next presentation will be on Thursday 23 April, by Philip Dawid. He will present us the dangers of seeing too much causality in Direct Acyclic Graphs: Beware of the DAG! When: Thursday 23 April 2009, Paris 17h, ET 11h, PT 8h, Tokyo 0h(Friday) URL: http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090423PD Phone number: +1 (218) 936-7999 Participant code: 665140# See below for tips how to access the conference on the Internet using voice-over-IP. Abstract of the presentation: -------- Directed acyclic graph (DAG) models are popular tools for describing causal relationships and for guiding attempts to learn them from data. In particular, they appear to supply a means of extracting causal conclusions from probabilistic conditional independence properties inferred from purely observational data. I take a critical look at this enterprise, and suggest that it is in need of more, and more explicit, methodological and philosophical justification than it typically receives. In particular, I argue for the value of a clean separation between formal causal language and intuitive causal assumptions. -------- If you know of potentially interested speakers or if you wish to present a paper, please send us a message so we can add you in the planning. The planning for the next presentation is maintained at: http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture Tip for a cheap voice-over-IP connection to the conference: Open an account on nonoh.net (server in Germany) or voipcheap.com (server in US) or another SIP provider near your country. Pay 10 euros for 3 or 4 month unlimited access to nearly all land line phone numbers in the world, including the teleconference number. This means you will be able to spend as much time as you want for free on long-distance and international calls with the basic fee. Use a microphone + headset on your computer, and a program like Twinkle (Linux) or Ekiga (multi-OS) or X-Lite (windows). Set it up with your nonoh.net or voipcheap account (see the help page on these sites). Beware of filtering firewalls. Then call our teleconference number from your computer. Note: We are not affiliated to any of these groups. We provide this information in the hope it is useful. Best regards, Nicolas Brodu From nicolas.brodu at numerimoire.net Tue Apr 21 12:05:54 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Tue, 21 Apr 2009 14:05:54 +0200 Subject: [Causality-ML] Reminder, and next talk preprint paper (Beware of the DAG!) Message-ID: <200904211405.54922.nicolas.brodu@numerimoire.net> Dear Causality and Machine Learning group, The preprint paper for the next talk is available online: http://clopinet.com/isabelle/Projects/reading/Dawid_NIPS08_causality_preprint.pdf The presentation starts on Thursday 23 April, given by Philip Dawid. He will present us the dangers of seeing too much causality in Direct Acyclic Graphs: Beware of the DAG! When: Thursday 23 April 2009, Paris 17h, ET 11h, PT 8h, Tokyo 0h(Friday) URL: http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090423PD Phone number: +1 (218) 936-7999 Participant code: 665140# See below for tips how to access the conference on the Internet using voice-over-IP. Abstract of the presentation: -------- Directed acyclic graph (DAG) models are popular tools for describing causal relationships and for guiding attempts to learn them from data. In particular, they appear to supply a means of extracting causal conclusions from probabilistic conditional independence properties inferred from purely observational data. I take a critical look at this enterprise, and suggest that it is in need of more, and more explicit, methodological and philosophical justification than it typically receives. In particular, I argue for the value of a clean separation between formal causal language and intuitive causal assumptions. -------- If you know of potentially interested speakers or if you wish to present a paper, please send us a message so we can add you in the planning. The planning for the next presentation is maintained at: http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture Tip for a cheap voice-over-IP connection to the conference: Open an account on nonoh.net (server in Germany) or voipcheap.com (server in US) or another SIP provider near your country. Pay 10 euros for 3 or 4 month unlimited access to nearly all land line phone numbers in the world, including the teleconference number. This means you will be able to spend as much time as you want for free on long-distance and international calls with the basic fee. Use a microphone + headset on your computer, and a program like Twinkle (Linux) or Ekiga (multi-OS) or X-Lite (windows). Set it up with your nonoh.net or voipcheap account (see the help page on these sites). Beware of filtering firewalls. Then call our teleconference number from your computer. Note: We are not affiliated to any of these groups. We provide this information in the hope it is useful. Best regards, Nicolas Brodu From guyon at clopinet.com Wed Apr 22 14:33:06 2009 From: guyon at clopinet.com (Isabelle Guyon) Date: Wed, 22 Apr 2009 07:33:06 -0700 Subject: [Causality-ML] Pearl's comments on Dawid's paper Message-ID: <49EF2AA2.80506@clopinet.com> Dear colleagues, Judea Pearl has kindly accepted that we publish on this mailing list his review of Dawid's paper to be discussed tomorrow http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090423PD. Please find it below as food for thoughts and send back your comments on this list. Best regards, Isabelle Comments on Dawid's "Beware of Everything" p. 7 The core premise of this paper is conveyed through the teletype font, which is used, according to Dawid: "to highlights non-mathematical causal concepts, that are here supposed to correspond to certain purely graphical aspect of the structure of Figure 4" Supposedly, causal concepts are less "mathematical" than CI concepts and representing them in a dag constitutes a crime or a danger, like mixing the holy with the profane. I fail to see why one should refrain from representing any knowledge, be it objective or subjective, in a mathematical code, be it graphical or symbolic, that fits that knowledge. The construction of probabilistic DAG according to (2) is based on subjective judgment of certain CI's, So why wouldn't the construction of causal DAG's, based on subjective judgment of certain causal relations, (as in Causality page 24 or in Hernan and Robins), be less mathematical. Same goes to ANY set of judgments that one feels comfortable to submit to formal representation. Moreover, many would argue (I am among them) that judgments about CI are none others but derivatives of judgments about causal relations. Accordingly, if the construction of (2) originates with judgments about causal relations, is it not more sensible and reliable to capture the latter directly (in a graph) rather than pretending that using the language of CI in the construction protects us from dangers (never specified by Dawid) and makes us "more mathematical"?? Dawid may argue that CI are potentially testable, hence preferred to "causal relations" -- this aint so. Causal relations are also potentially testable, under experimental conditions. See Causality page 24 where Causal DAGS are defiined precisely through interventions.. On page 13, paragraph begining "But what is typically omitted in ..." is again, disrespectful of causal analysts who, to the best of knowledge, are very very careful in distinguishing causal from probabilistic DAG. If the purpose of Dawid's paper is to re-emphasize this distinction, or to propose a new notation to make the distinction more vivid, fine, but 2 pages should suffice for this purpose. I see no point spending 20 pages warning people that the assumptions of causal discovery (i.e., stability in my terminology, faithfulness in SGS terminology) are causal assumptions, and need to be kept in mind in any discovery exercise. BTW, Justification for stability is given in chapter 2 of Causality. (page 63.) I was surprised to read (page 19) that "Such assumptions are rarely articulated explicitly, and even more rarely justified" . Is it possible that the reason Dawid does not find justification in the literature is that my justification has been accepted as self-evident? I would be delighted. If not, Dawid should tell us why my justification is not satisfactory, instead of complaining about "Such assumptions are rarely articulated explicitly, and even more rarely justified". One last remark. In Fig 6, the arrow between U and V is arbitrary. If Fig 6 was a Causal Bayesian Network, (see definition in Causality page 24), that arrow will be determined by thinking about possible interventions like F_U or F_V. Which of the two representations should we prefer?. I see a certain advantage to the latter: the graph is constructed once and then serves to predict the effects of any future actions, including those that were not anticipated during the construction. In Dawid's approach, one either constructs an ID with a specific intervention in mind , which leaves some arrows undetermined and requires a new construction effort when new interventions are considered. Or, one imagines an F variable for every variable, which would result in an identical graph to a CBN, augmented with a superfluous F variable for each node. I fail to see what is gained by hanging a dangling arrow into every node, especially if one believes that the knowledge used in the construction resides in our mind without those danglers, We say Smoking causes cancer" not "intervening on smoking causes cancer". We say "being a female causes the secretion of certain hormones" no intervention involved. IF Dawid really craves for a "mathematical" definition of Causal Bayesian Networks, let us construct an influence diagram for all possible interventions F_1, F-2 etc. remove the F's, and lo and behold, a well defined mathematical object is created, approved and certified even by Dawid. This would end all controversies in causality land, and a new era will begins, with new results, new tools, useful theorems and real progress. How about it? Note that the argument about how experiential knowledge is organized in our brain is a psychological argument, to be determined by psychological experiments. Dawid does not advocate running such experiments. Instead, he prefers CI construction over causal construction because in statistics tradition CI's are considered "potentially testable" and causal relations have been treated with suspicion. This is a theological argument that is losing popularity in the 21st century. The reason why I abandoned the "augmented graph" representation (1993) is because I strongly felt that "this is not the way people think"; people dont think with dangling F's -- they think counterfactually. A variable "listens" to its causal parents, rather than "responds to interventions on its parents" Causal Bayesian Networks are popular today because this is how people think, not because I made them popular. I may be wrong in this hypothesis. But at least I mention the conjecture "this is how we think" in my writings. Dawid does not. For him, how we think is secondary (though he admits that the input for CI construction is judgmental). But secondary to what? Secondary to some nebulous yet revered considerations (e.g., traditional statistics education) that for me are not only secondary, but the primary culprits for a century of causal neglect under the tyrany of statistical thinking. To summarize, Dawid should tell us: 1. Are CI's more mathematical than arrows? 2. If I define arrows through Dawid's augmented DAGS, (with an F into every node), would the DAGs w/o the F's be kosher. 3. Isn't this what the definition on page 24 says? 4. Are the IC's used in the construction of your DAGS judgmental? 5. If so, Isn't it important to ask "how people produce such judgments, and, in general, how people organize experiential knowledg in their minds? 6. Suppose psychological experiments reveals without a shaddow of doubt that people store knowledge in Causal Bayes networks. Would you then accept the constructioin of page 24 without the pretence of IC-DAG? 7. How did your discovery algorithm perform in the latest competition? --------Judea From cg09 at andrew.cmu.edu Thu Apr 23 01:12:24 2009 From: cg09 at andrew.cmu.edu (Clark Glymour) Date: Wed, 22 Apr 2009 21:12:24 -0400 (EDT) Subject: [Causality-ML] Professor Dawid's paper Message-ID: <50310.24.3.147.210.1240449144.squirrel@24.3.147.210> Professor Dawid ?s worry is announced in his abstract: ?My fundamental concern is the relationship between, on the one hand, properties or concepts relating to an external reality, such as probabilistic independence or causality,which we wish to elucidate or manipulate; and, on the other hand, formal representations of such properties by means of mathematical or logical structures, such as graphs. It is important to avoid confusing the picture with the reality.? Be at ease. Absolutely not to worry. I have never once, not once, seen someone draw a graph or write a formula when they actually thought they were manipulating what the symbols were supposed to denote. Not once. Word and object, we are ace at distinguishing those. So I thought, having solved Professor Dawid?s concern, I should stop, but I read on a little ways. To a really important announcement: ?it is worthy of continual repetition and emphasis, that there is absolutely no logical reason for there to be any connexion whatsoever between observations made under the different regimes of seeing and doing: a system may very well behave entirely differently when it is kicked than when it is left alone? Good point that. I checked my logic books, no proofs of that connection. Also no proofs that the past ever was, no proofs that the future will come to be, no proofs that Professor Dawid has mental states, no proofs that an external world exists, no proofs that the so-called laws of nature will hold next week. Not much use, those logic books, unless you assume or hypothesize stuff and then want to know the consequences. On the other hand, there is this funny literature?I wonder if Professor Dawid has read it?where people investigate when, under what various assumptions about the world, other things follow. Like, for example, there is this subject called Euclidean geometry where assumptions are make about space, and then all kinds of interesting other things are proved about space, really amazing stuff?you could use it to design buildings even. But I read that the assumptions are not always true. Pity. Also, there was this guy Newton who had these three assumptions, and then some ?rules of reasoning? at the back. He got these amazing consequences, which mostly turned out to be correct, although I hear that his assumptions don?t always hold, and I sure could not find his rules of reasoning in my logic book. I guess it can?t be the same with observing and doing. There just couldn?t be any assumptions about the connections and proofs from assumptions that you can make the kind of inferences Professor Dawid is talking about?inferences from observations to effects of actions. Or proofs that under other assumptions you can?t. Couldn?t be. So, nah In fact, Professor Dawid is really helpful about this. He tells us that if we make the wrong assumptions, or not enough of the right ones, we won?t get that logical connection between seeing and doing. Not a chance of it. ?We say that a DAG D with node-set V, a set of variables, represents a collection C of CIproperties over V if the relation (bunch of symbols here, way over my head) is in C if and only if S and T are d-separated by U in D. This relationship between a D and a collection of CI properties will constitute our semantic interpretation of a DAG.? Well sure enough, one thing talks about causality, the other talks just about probability. Different terms. My logic book tells me there have to be terms in common between the premises and the conclusion?unless the conclusion is logically true. Kind of like ?force? and ?acceleration? or like ?probability? and ?unbiased,? or ?perpetual? and ?motion?--no logical connection. So Professor Dawid really nailed that one. Well, I should go on reading this stuff, like how we should just talk about probability because graph theory isn?t mathematics (so many silly people who thought they were doing mathematics) and how science is all about conditional independence not causation (I knew those physicists and chemists and epidemiologists had to be crazy talking about what does and doesn?t cause what), and I am sure he has discovered a lot more stuff than those crazy causal guys who think they have methods that have discovered errors in a mass spectrometer aboard a satellite (imagine?they weren?t ever there), and how to tell what rocks are made of from the radiation bouncing off them, and how to reduce the rate of college dropouts in a college, and that acid rain caused plant die offs in an estuary, and the processes that go on in the brain in an experiment (something about fmri), and even global climate teleconnections?those guys are so crazy. But since I solved the problem Professor Dawid had at t! he beginning, I will just have a martini. Bye, Clark Glymour From jlemeire at info.vub.ac.be Thu Apr 23 07:29:13 2009 From: jlemeire at info.vub.ac.be (Lemeire Jan) Date: Thu, 23 Apr 2009 09:29:13 +0200 Subject: [Causality-ML] My opinion on Dawid's paper Message-ID: <6B6248DF3DE4F94FACE3C88A8ED4299977FCB1@etropcs03.etro.vub.ac.be> Dear colleagues-in-causality, Below you can find my opinion on Dawid's paper. Kind regards, Jan Lemeire Dawid's understanding of causation is about the (probabilistic) consequences of interventions. An intervention changes the system, hence the joint probability distribution, and should therefore be regarded as a different regime. Dawid doesn't want to take explicit and implicit assumptions on how different regimes should be related. Every regime could generate a different model. Then is my question, how will we model these differences in an augmented DAG? What differences are there to be expected? Dawid is right that the choice of a DAG as the representation of causation might let us forget that it is based on assumptions about the behavior of systems under interventions, namely CMC. It indeed might lead to overinterpretation of the arrows. I agree with him that we have to study thoroughly the CMC assumption. It is however not true that we are not aware of this assumption. See for example intro of causal bayesian nets in Zhang and Spirtes, "Detection of Unfaithfulness and Robust Causal Inference", 2007. But, as long as CMC holds, the DAG representation is the representation to choose, since it is a dense representation. An augmented DAG creates a lot more free parameters and unneccessary redundancies if CMC holds. So I infer that the necessity of an augmented DAG representation comes from a violation of CMC? Another question: when is a causal DAG compliant with the augmented one? I guess when the DAGs without the decision variables are similar in the different regimes and when X _||_ nd(X) | FX != 0 (nd: non-descendants of X, FX: intervention variable of X) which is CMC. These last independencies are not explicit in the graph, but I guess they follow from the deterministic relation between FX and X when FX != 0. It would be nice to elaborate on that. Another question: FX seems to replace Pearl's do-operator. Can we simply replace FX by do(X) in the CI statements (5)-(8)? Chapter 7 seems obsolete, some things are repeated. In Chapter 8 he want to show with an example the differences between augmented DAGs and causal DAGs. The example is taken from Hernan and Robins (2008). Two graphs are compared, one in which Z is a parent of X and one in which Z and X are related by a latent common cause U*. According to my understanding, it deals with the problem whether Z is an instrumental variable or not. The graph with U* gives the same conditional independencies as the graph with Z -> X, hence Z should not have a causal effect on X. This might look as a problem for causal graphs. But it isn't. Based on the variables involved in the problem, it is not sure that Z -> X, since we have to take into account the possibilities latent variables. The right causal graph is then a partially-oriented graph (PAG) with Z 0-> X, meaning that X is not a cause of Z, but we don't know whether Z is a cause of X or not. I think there is more to say about the relation between causal DAGs and augmented DAGs. This discussion also doesn't happen in section 9, which deals with questions about instrumentation variables. ------------------------------------------------------- Jan Lemeire Senior Researcher Electronics & Informatics Department (ETRO) Vrije Universiteit Brussel (VUB) Pleinlaan 2, B-1050 Brussels, Belgium http://parallel.vub.ac.be tel: +32 2 629 29 97; DISCLAIMER(S): http://www.etro.vub.ac.be/disclaimer http://www.eqcologic.be/disclaimer -------------- next part -------------- An HTML attachment was scrubbed... URL: From A.P.Dawid at statslab.cam.ac.uk Thu Apr 23 09:07:35 2009 From: A.P.Dawid at statslab.cam.ac.uk (Philip Dawid) Date: Thu, 23 Apr 2009 10:07:35 +0100 (BST) Subject: [Causality-ML] Pearl's comments on Dawid's paper In-Reply-To: <49EF2AA2.80506@clopinet.com> References: <49EF2AA2.80506@clopinet.com> Message-ID: Fine, but please remember that my paper is currently only a draft, and it is clear from this and other early responses that I have expressed myself badly, so laying myself open to misinterpretation. I will try to be clearer in the presentation. Philip ------------------------------------------------------------------------- A. P. Dawid, Professor of Statistics, Statistical Laboratory, Centre for Mathematical Sciences, Wilberforce Road, Cambridge CB3 0WB, UK Tel: +44 (0)1223 766535 E-mail: apd[at]statslab.cam.ac.uk Personal web page: http://tinyurl.com/2maycn Department web page: http://www.statslab.cam.ac.uk/ ------------------------------------------------------------------------- On Wed, 22 Apr 2009, Isabelle Guyon wrote: > Dear colleagues, > > Judea Pearl has kindly accepted that we publish on this mailing list his > review of Dawid's paper to be discussed tomorrow > http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090423PD. > > Please find it below as food for thoughts and send back your comments on this > list. > > Best regards, > > Isabelle > > Comments on Dawid's "Beware of Everything" > p. 7 > The core premise of this paper is conveyed through > the teletype font, which is used, according > to Dawid: "to highlights non-mathematical > causal concepts, that are here supposed to correspond to > certain purely graphical aspect of the structure of Figure 4" > > Supposedly, causal concepts are less "mathematical" > than CI concepts and representing them in > a dag constitutes a crime or a danger, like mixing the > holy with the profane. > > I fail to see why one should refrain from representing > any knowledge, be it objective or subjective, in a mathematical > code, be it graphical or symbolic, that fits that knowledge. > > The construction of probabilistic DAG according to (2) > is based on subjective judgment of certain CI's, > So why wouldn't the construction of causal DAG's, > based on subjective judgment of certain causal relations, > (as in Causality page 24 or in Hernan and Robins), be less mathematical. > Same goes to ANY set of judgments that one feels comfortable > to submit to formal representation. > > Moreover, many would argue (I am among them) that judgments about > CI are none others but derivatives of judgments about causal relations. > Accordingly, if the construction of (2) originates with > judgments about causal relations, > is it not more sensible and reliable to capture the latter > directly (in a graph) rather than pretending that using the > language of CI in the construction protects us from > dangers (never specified by Dawid) and makes us "more mathematical"?? > > Dawid may argue that CI are potentially testable, hence > preferred to "causal relations" -- this aint so. > Causal relations are also potentially testable, under experimental > conditions. See Causality page 24 where Causal DAGS are > defiined precisely through interventions.. > > On page 13, paragraph begining "But what is typically omitted > in ..." is again, disrespectful of causal analysts who, > to the best of knowledge, are very very careful in > distinguishing causal from probabilistic DAG. > > If the purpose of Dawid's paper is to re-emphasize this > distinction, or to propose a new notation to make the > distinction more vivid, fine, but 2 pages should suffice > for this purpose. I see no point spending 20 pages warning > people that the assumptions of causal discovery (i.e., > stability in my terminology, faithfulness > in SGS terminology) are causal assumptions, and need to > be kept in mind in any discovery exercise. > BTW, Justification for stability is given in chapter 2 of > Causality. (page 63.) I was surprised to read (page 19) that > "Such assumptions are rarely articulated explicitly, and > even more rarely justified" . Is it possible that the > reason Dawid does not find justification in the literature > is that my justification has been accepted as self-evident? > I would be delighted. If not, Dawid should tell us why > my justification is not satisfactory, instead of complaining > about "Such assumptions are rarely articulated explicitly, and > even more rarely justified". > > > One last remark. In Fig 6, the arrow between U and V is > arbitrary. If Fig 6 was a Causal Bayesian Network, > (see definition in Causality page 24), that arrow will > be determined by thinking about possible interventions like > F_U or F_V. Which of the two representations should we prefer?. > I see a certain advantage to the latter: the graph > is constructed once and then serves to predict the > effects of any future actions, including those that were > not anticipated during the construction. > > In Dawid's approach, one either constructs an ID with a specific > intervention in mind , which leaves some arrows > undetermined and requires a new construction effort > when new interventions are considered. Or, > one imagines an F variable for every variable, which > would result in an identical graph to a CBN, augmented > with a superfluous F variable for each node. > > I fail to see what is gained by hanging a dangling > arrow into every node, especially if one believes > that the knowledge used in the construction resides in > our mind without those danglers, > > We say Smoking causes cancer" not "intervening on smoking > causes cancer". We say "being a female causes the secretion > of certain hormones" no intervention involved. > > IF Dawid really craves for a "mathematical" definition of > Causal Bayesian Networks, let us construct an influence > diagram for all possible interventions F_1, F-2 etc. > remove the F's, and lo and behold, a well defined > mathematical object is created, approved and certified > even by Dawid. This would end all controversies > in causality land, and a new era will begins, with new > results, new tools, useful theorems and real progress. > How about it? > > Note that the argument about how experiential knowledge is > organized in our brain is a psychological argument, to be determined > by psychological experiments. Dawid does not advocate > running such experiments. Instead, he prefers CI construction > over causal construction because in statistics tradition > CI's are considered "potentially testable" and causal > relations have been treated with suspicion. This is > a theological argument that is losing popularity in > the 21st century. > > The reason why I abandoned the "augmented graph" > representation (1993) is because I strongly felt that > "this is not the way people think"; people dont think > with dangling F's -- they think counterfactually. > A variable "listens" to its causal parents, > rather than "responds to interventions on its parents" > Causal Bayesian Networks are popular today because > this is how people think, not because I made them popular. > > I may be wrong in this hypothesis. But at least I > mention the conjecture "this is how we think" in my > writings. Dawid does not. For him, how we think is > secondary (though he admits that the input for CI construction > is judgmental). But secondary to what? Secondary to some nebulous > yet revered considerations (e.g., traditional statistics education) > that for me are not only secondary, but the primary culprits > for a century of causal neglect under the tyrany of > statistical thinking. > > To summarize, Dawid should tell us: > 1. Are CI's more mathematical than arrows? > 2. If I define arrows through Dawid's augmented DAGS, > (with an F into every node), would the DAGs w/o the F's be > kosher. > 3. Isn't this what the definition on page 24 says? > 4. Are the IC's used in the construction of your DAGS > judgmental? > 5. If so, Isn't it important to ask "how people produce > such judgments, and, in general, how people organize > experiential knowledg in their minds? > 6. Suppose psychological experiments reveals without > a shaddow of doubt that people store knowledge > in Causal Bayes networks. Would you then accept > the constructioin of page 24 without the pretence > of IC-DAG? > 7. How did your discovery algorithm perform in the > latest competition? > > --------Judea > > _______________________________________________ > Causality and Machine Learning > Mailing-list subscription: > http://mail.encours.org/listinfo/causality-ml > Material and presentation schedule: > http://www.afia-france.org/tiki-index.php?page=Groupe+de+lecture > From tsamard at ics.forth.gr Mon Apr 27 15:09:01 2009 From: tsamard at ics.forth.gr (Ioannis Tsamardinos) Date: Mon, 27 Apr 2009 18:09:01 +0300 Subject: [Causality-ML] Reference on violations of faithfulness In-Reply-To: <49F0348B.10204@clopinet.com> References: <49F0348B.10204@clopinet.com> Message-ID: <01b301c9c74a$189e3aa0$49daafe0$@forth.gr> Dear Isabelle, Thank you for organizing such interesting discussions. I restate and clarify my comment during the last presentation and I am sending you some more thoughts sparked by the discussion. FAITHFULNESS One of the problems with current causality algorithms is the faithfulness assumption, i.e., all algorithms (as far as I know) assume a faithful distribution (we have recently introduced the Feature Space Markov Blanket algorithm that performs feature selection even for non-faithful distributions but its properties are still under investigation). There are positive theoretical results regarding faithfulness (by Meek 1995), namely that if the parameters of a network are chosen randomly following a Dirichlet distribution, the network distribution is "almost surely" faithful, or in other words, the set of unfaithful distributions has Lebesgue measure 0. However, I have the following objections to the practicality of this result a) non-faithfulness in the precise mathematical sense is difficult to encounter with random selection of the parameters. However, the probability of encountering distributions *close* to non-faithfulness maybe significant. While asymptotically this is not a problem, it is a problem with finite sample: it may require enormous sample sizes to reconstruct such networks accurately. For example, if T = XOR(A, B) (the exclusive OR of A and B) AND the values of A , B are independent and follow a prior of 50-50% the network is not faithful. If the prior of A however is 49.99 - 50.01 the network is faithful but an enormous amount of data may be required to reconstruct it successfully with the existing algorithms: A is barely associated with T pairwise. In Feature Space Markov Blanket we try to overcome this problem by using an SVM to look at all the variables simultaneously and identify a strong multivariate distribution between (A, B) and T, even when A or B individually have very small association with T. b) Nature is not choosing randomly the weights of the causal processes, as the theorem by Meek assumes. Evolution in particular is probably *designed* to build non-faithful distributions as pointed out by Drs. Richard Scheines & Peter Spirtes, slide #10-15 of their presentation at http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090121RS c) Say the above theorem by Meek holds (i.e., causal processes in Nature lead to faithful distributions with probability 1) but normally we observe only a subset of all variables and their marginal distribution. Then, the marginal distribution is *non-faithful* with probability 1, when causal sufficiency fails. In fact, this violation of faithfulness in the observed distribution is what allows the FCI algorithm to detect hidden confounding variables. Here is an example: A -> B <- H -> C <- D Assume this network is faithful but H is not observed. In the marginal, we discover during FCI the structure A - B - C - D The edge B-C is oriented as A->B<-C by checking the (A,B,C) v-structure, and in the opposite direction B->C<-D when checking the (B,C,D) v-structure. FCI assumes faithfulness of the original process and to resolve the discrepancy assumes a hidden variable and reports a bidirectional edge A -> B <-> C <- D Thus, by assuming the complete process (no latent variables) is faithful means that whenever there is a hidden confounder the observed distribution is non-faithful. Such results lead to the following CONTRADICTION in our community: i) causal discovery using Causal Bayesian Networks + Causal Sufficiency + Faithfulness assumes the observable distributions are faithful with probability 1 (e.g., PC, GES, MMHC, Greedy Search'n'Score algorithms) ii) causal discovery using Causal Bayesian Networks + Faithfulness assumes the observable distributions are faithful with probability 0 (FCI, IC*, etc.) My intuition on Faithfulness is that we are in need of further theory and algorithms; it will finally be possible to accurate reconstruct networks even under *some types* of non-faithfulness; however, not assuming faithfulness will create new types of Markov Equivalences and thus significantly reduce the causal information we can safely identify. For example, if A->B->C is faithful, the PDAG is A-B-C. When A->B->C contains all deterministic 1-to-1 relations (non-faithful), A->B->C cannot be distinguished without experiments from B->A->C in addition to the typical Markov Equivalences we find even when faithfulness holds. CAUSAL MARKOV CONDITION The CMC is very intuitively appealing (to me at least). There has been plenty of discussion on it so I hope I won't repeat much. It states: a variable is independent of its non-effect conditioned on its direct causes. I don't argue on the CMC holding under a suitable definition of the variables. But I believe there are pitfalls to apply it that have to do with what is a " variable" and how it is measured. The inventors of causal discovery are aware of these pitfalls and the first to point them out I think. A great example is at slide #16 of the Richard Scheines & Peter Spirtes presentation (details above). If we are measuring Z' = Z + e (a noisy version of Z) and given the structure A <- Z -> B, Z->Z', then Dependent(A;B|Z'). So, to the researcher the CMC seems to be violated. Only if we have noiseless variables we can guarantee the CMC. The remaining of the Scheines&Spirtes presentation shows more examples and cases. My intuition is that further theory is required to allow us to "say what we can say" even when the CMC maybe violated due to specific conditions. I am guessing for example, that the CMC still holds when all variables are measured with independent and identically distributed noise. CAUSAL DISCOVERY THEORY AND CLASSICAL STATISTICAL CAUSAL DISCOVERY Another comment I made during the presentation is the following: the typical way of discovering causality by use of a randomized experiment also makes the Faithfulness assumption. For example, if T=XOR(A, B), A, B have 50-50% independent priors this is a non-faithful network. A randomized trial controlling/manipulating A will discover A has no pair-wise association with T and thus wrongly infer that A is not causing T. I wonder how many statisticians are aware of such complexities. I suspect that standard causal discovery via experimentation and statistical analysis also implicitly uses the CMC in experiments with matched samples. I have some examples in mind but I need to work the theory better first. IT -----Original Message----- From: Isabelle Guyon [mailto:guyon at clopinet.com] Sent: Thursday, April 23, 2009 12:28 PM To: tsamard at ics.forth.gr Subject: Reference on violations of faithfulness Dear Ioannis, Thank you very much for taking part in the discussion this morning. Could you please send me a reference to the paper you mentionned on violation of faithfulness with examples in biology? Many thanks in advance, Isabelle PS: others may benefit from it, you may make a post to causality-ml at encours.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From nicolas.brodu at numerimoire.net Mon Apr 27 21:31:26 2009 From: nicolas.brodu at numerimoire.net (Nicolas Brodu) Date: Mon, 27 Apr 2009 23:31:26 +0200 Subject: [Causality-ML] Beware of the DAG: Replay available In-Reply-To: <01b301c9c74a$189e3aa0$49daafe0$@forth.gr> References: <49F0348B.10204@clopinet.com> <01b301c9c74a$189e3aa0$49daafe0$@forth.gr> Message-ID: <200904272331.26147.nicolas.brodu@numerimoire.net> Dear participants, The video replay of last Thursday's presentation (Philip Dawid: Beware of the DAG!) is now available online: http://www.encours.org/causality/Beware_of_the_DAG/replay.html You can now hear the comment Ioannis refers to, near the end of the presentation. Regards, Nicolas Ioannis Tsamardinos wrote: > Dear Isabelle, > > > > Thank you for organizing such interesting discussions. > > > > I restate and clarify my comment during the last presentation and I am > sending you some more thoughts sparked by the discussion. > > > > FAITHFULNESS > > > > One of the problems with current causality algorithms is the faithfulness > assumption, i.e., all algorithms (as far as I know) assume a faithful > distribution (we have recently introduced the Feature Space Markov Blanket > algorithm that performs feature selection even for non-faithful > distributions but its properties are still under investigation). There are > positive theoretical results regarding faithfulness (by Meek 1995), namely > that if the parameters of a network are chosen randomly following a > Dirichlet distribution, the network distribution is "almost surely" > faithful, or in other words, the set of unfaithful distributions has > Lebesgue measure 0. However, I have the following objections to the > practicality of this result > > > > a) non-faithfulness in the precise mathematical sense is difficult to > encounter with random selection of the parameters. However, the probability > of encountering distributions *close* to non-faithfulness maybe > significant. While asymptotically this is not a problem, it is a problem > with finite sample: it may require enormous sample sizes to reconstruct > such networks accurately. For example, if T = XOR(A, B) (the exclusive OR > of A and B) AND the values of A , B are independent and follow a prior of > 50-50% the network is not faithful. If the prior of A however is 49.99 - > 50.01 the network is faithful but an enormous amount of data may be > required to reconstruct it successfully with the existing algorithms: A is > barely associated with T pairwise. > > > > In Feature Space Markov Blanket we try to overcome this problem by using an > SVM to look at all the variables simultaneously and identify a strong > multivariate distribution between (A, B) and T, even when A or B > individually have very small association with T. > > > > b) Nature is not choosing randomly the weights of the causal processes, as > the theorem by Meek assumes. Evolution in particular is probably *designed* > to build non-faithful distributions as pointed out by Drs. Richard Scheines > & Peter Spirtes, slide #10-15 of their presentation at > http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090121RS > > > > c) Say the above theorem by Meek holds (i.e., causal processes in Nature > lead to faithful distributions with probability 1) but normally we observe > only a subset of all variables and their marginal distribution. Then, the > marginal distribution is *non-faithful* with probability 1, when causal > sufficiency fails. In fact, this violation of faithfulness in the observed > distribution is what allows the FCI algorithm to detect hidden confounding > variables. Here is an example: > > > > A -> B <- H -> C <- D > > > > Assume this network is faithful but H is not observed. In the marginal, we > discover during FCI the structure > > > > A - B - C - D > > > > The edge B-C is oriented as A->B<-C by checking the (A,B,C) v-structure, > and in the opposite direction B->C<-D when checking the (B,C,D) > v-structure. FCI assumes faithfulness of the original process and to > resolve the discrepancy assumes a hidden variable and reports a > bidirectional edge > > > > A -> B <-> C <- D > > > > Thus, by assuming the complete process (no latent variables) is faithful > means that whenever there is a hidden confounder the observed distribution > is non-faithful. > > > > Such results lead to the following CONTRADICTION in our community: > > > > i) causal discovery using Causal Bayesian Networks + Causal Sufficiency + > Faithfulness assumes the observable distributions are faithful with > probability 1 (e.g., PC, GES, MMHC, Greedy Search'n'Score algorithms) > > > > ii) causal discovery using Causal Bayesian Networks + Faithfulness assumes > the observable distributions are faithful with probability 0 (FCI, IC*, > etc.) > > > > > > My intuition on Faithfulness is that we are in need of further theory and > algorithms; it will finally be possible to accurate reconstruct networks > even under *some types* of non-faithfulness; however, not assuming > faithfulness will create new types of Markov Equivalences and thus > significantly reduce the causal information we can safely identify. For > example, if A->B->C is faithful, the PDAG is A-B-C. When A->B->C contains > all deterministic 1-to-1 relations (non-faithful), A->B->C cannot be > distinguished without experiments from B->A->C in addition to the typical > Markov Equivalences we find even when faithfulness holds. > > > > > > CAUSAL MARKOV CONDITION > > > > The CMC is very intuitively appealing (to me at least). There has been > plenty of discussion on it so I hope I won't repeat much. It states: > > > > a variable is independent of its non-effect conditioned on its direct > causes. > > > > I don't argue on the CMC holding under a suitable definition of the > variables. But I believe there are pitfalls to apply it that have to do > with what is a " variable" and how it is measured. The inventors of causal > discovery are aware of these pitfalls and the first to point them out I > think. A great example is at slide #16 of the Richard Scheines & Peter > Spirtes presentation (details above). If we are measuring Z' = Z + e (a > noisy version of Z) and given the structure > > > > A <- Z -> B, Z->Z', > > > > then Dependent(A;B|Z'). So, to the researcher the CMC seems to be violated. > Only if we have noiseless variables we can guarantee the CMC. The remaining > of the Scheines&Spirtes presentation shows more examples and cases. > > > > My intuition is that further theory is required to allow us to "say what we > can say" even when the CMC maybe violated due to specific conditions. I am > guessing for example, that the CMC still holds when all variables are > measured with independent and identically distributed noise. > > > > CAUSAL DISCOVERY THEORY AND CLASSICAL STATISTICAL CAUSAL DISCOVERY > > > > Another comment I made during the presentation is the following: the > typical way of discovering causality by use of a randomized experiment also > makes the Faithfulness assumption. > > > > For example, if T=XOR(A, B), A, B have 50-50% independent priors this is a > non-faithful network. A randomized trial controlling/manipulating A will > discover A has no pair-wise association with T and thus wrongly infer that > A is not causing T. I wonder how many statisticians are aware of such > complexities. > > > > I suspect that standard causal discovery via experimentation and > statistical analysis also implicitly uses the CMC in experiments with > matched samples. I have some examples in mind but I need to work the theory > better first. > > > > IT > > > > > > > > > > > > -----Original Message----- > From: Isabelle Guyon [mailto:guyon at clopinet.com] > Sent: Thursday, April 23, 2009 12:28 PM > To: tsamard at ics.forth.gr > Subject: Reference on violations of faithfulness > > > > Dear Ioannis, > > > > Thank you very much for taking part in the discussion this morning. > > Could you please send me a reference to the paper you mentionned on > > violation of faithfulness with examples in biology? > > Many thanks in advance, > > > > Isabelle > > > > PS: others may benefit from it, you may make a post to > > causality-ml at encours.org From denver.h.dash at intel.com Mon Apr 27 22:00:30 2009 From: denver.h.dash at intel.com (Dash, Denver H) Date: Mon, 27 Apr 2009 15:00:30 -0700 Subject: [Causality-ML] Reference on violations of faithfulness In-Reply-To: <01b301c9c74a$189e3aa0$49daafe0$@forth.gr> References: <49F0348B.10204@clopinet.com> <01b301c9c74a$189e3aa0$49daafe0$@forth.gr> Message-ID: <34D415B755EFAE4D8B827211EAE62CA2465B6F2F@orsmsx506.amr.corp.intel.com> Hi all, As a follow up to Ioannis' comments regarding the practicality of the faithfulness assumption. The talk I presented a couple weeks ago points out and tries to characterize the fact that when a dynamic system goes through an equilibrium point, faithfulness is almost always violated. For example, if you have a set of forces acting on a body causing an acceleration: F1 + F2 = ma, in equilibrium, a=0 independent of the values of F1 and F2. Thus, violations of faithfulness are not just possible, they are ubiquitous in nature. Many, if not all, systems for which causality is useful will involve equilibria. One could even say it is these violations of faithfulness that provide much of the structure of nature that as scientists we hope to observe. This motivates the importance of learning short-term dynamic representations of systems rather than longer-time-scale equilibrium models. Denver. pittsburgh.intel-research.net/~dhdash From: causality-ml-bounces at encours.org [mailto:causality-ml-bounces at encours.org] On Behalf Of Ioannis Tsamardinos Sent: Monday, April 27, 2009 11:09 AM To: 'Isabelle Guyon' Cc: causality-ml at encours.org Subject: Re: [Causality-ML] Reference on violations of faithfulness Dear Isabelle, Thank you for organizing such interesting discussions. I restate and clarify my comment during the last presentation and I am sending you some more thoughts sparked by the discussion. FAITHFULNESS One of the problems with current causality algorithms is the faithfulness assumption, i.e., all algorithms (as far as I know) assume a faithful distribution (we have recently introduced the Feature Space Markov Blanket algorithm that performs feature selection even for non-faithful distributions but its properties are still under investigation). There are positive theoretical results regarding faithfulness (by Meek 1995), namely that if the parameters of a network are chosen randomly following a Dirichlet distribution, the network distribution is "almost surely" faithful, or in other words, the set of unfaithful distributions has Lebesgue measure 0. However, I have the following objections to the practicality of this result a) non-faithfulness in the precise mathematical sense is difficult to encounter with random selection of the parameters. However, the probability of encountering distributions *close* to non-faithfulness maybe significant. While asymptotically this is not a problem, it is a problem with finite sample: it may require enormous sample sizes to reconstruct such networks accurately. For example, if T = XOR(A, B) (the exclusive OR of A and B) AND the values of A , B are independent and follow a prior of 50-50% the network is not faithful. If the prior of A however is 49.99 - 50.01 the network is faithful but an enormous amount of data may be required to reconstruct it successfully with the existing algorithms: A is barely associated with T pairwise. In Feature Space Markov Blanket we try to overcome this problem by using an SVM to look at all the variables simultaneously and identify a strong multivariate distribution between (A, B) and T, even when A or B individually have very small association with T. b) Nature is not choosing randomly the weights of the causal processes, as the theorem by Meek assumes. Evolution in particular is probably *designed* to build non-faithful distributions as pointed out by Drs. Richard Scheines & Peter Spirtes, slide #10-15 of their presentation at http://www.afia-france.org/tiki-index.php?page=GroupeDeLecture090121RS c) Say the above theorem by Meek holds (i.e., causal processes in Nature lead to faithful distributions with probability 1) but normally we observe only a subset of all variables and their marginal distribution. Then, the marginal distribution is *non-faithful* with probability 1, when causal sufficiency fails. In fact, this violation of faithfulness in the observed distribution is what allows the FCI algorithm to detect hidden confounding variables. Here is an example: A -> B <- H -> C <- D Assume this network is faithful but H is not observed. In the marginal, we discover during FCI the structure A - B - C - D The edge B-C is oriented as A->B<-C by checking the (A,B,C) v-structure, and in the opposite direction B->C<-D when checking the (B,C,D) v-structure. FCI assumes faithfulness of the original process and to resolve the discrepancy assumes a hidden variable and reports a bidirectional edge A -> B <-> C <- D Thus, by assuming the complete process (no latent variables) is faithful means that whenever there is a hidden confounder the observed distribution is non-faithful. Such results lead to the following CONTRADICTION in our community: i) causal discovery using Causal Bayesian Networks + Causal Sufficiency + Faithfulness assumes the observable distributions are faithful with probability 1 (e.g., PC, GES, MMHC, Greedy Search'n'Score algorithms) ii) causal discovery using Causal Bayesian Networks + Faithfulness assumes the observable distributions are faithful with probability 0 (FCI, IC*, etc.) My intuition on Faithfulness is that we are in need of further theory and algorithms; it will finally be possible to accurate reconstruct networks even under *some types* of non-faithfulness; however, not assuming faithfulness will create new types of Markov Equivalences and thus significantly reduce the causal information we can safely identify. For example, if A->B->C is faithful, the PDAG is A-B-C. When A->B->C contains all deterministic 1-to-1 relations (non-faithful), A->B->C cannot be distinguished without experiments from B->A->C in addition to the typical Markov Equivalences we find even when faithfulness holds. CAUSAL MARKOV CONDITION The CMC is very intuitively appealing (to me at least). There has been plenty of discussion on it so I hope I won't repeat much. It states: a variable is independent of its non-effect conditioned on its direct causes. I don't argue on the CMC holding under a suitable definition of the variables. But I believe there are pitfalls to apply it that have to do with what is a " variable" and how it is measured. The inventors of causal discovery are aware of these pitfalls and the first to point them out I think. A great example is at slide #16 of the Richard Scheines & Peter Spirtes presentation (details above). If we are measuring Z' = Z + e (a noisy version of Z) and given the structure A <- Z -> B, Z->Z', then Dependent(A;B|Z'). So, to the researcher the CMC seems to be violated. Only if we have noiseless variables we can guarantee the CMC. The remaining of the Scheines&Spirtes presentation shows more examples and cases. My intuition is that further theory is required to allow us to "say what we can say" even when the CMC maybe violated due to specific conditions. I am guessing for example, that the CMC still holds when all variables are measured with independent and identically distributed noise. CAUSAL DISCOVERY THEORY AND CLASSICAL STATISTICAL CAUSAL DISCOVERY Another comment I made during the presentation is the following: the typical way of discovering causality by use of a randomized experiment also makes the Faithfulness assumption. For example, if T=XOR(A, B), A, B have 50-50% independent priors this is a non-faithful network. A randomized trial controlling/manipulating A will discover A has no pair-wise association with T and thus wrongly infer that A is not causing T. I wonder how many statisticians are aware of such complexities. I suspect that standard causal discovery via experimentation and statistical analysis also implicitly uses the CMC in experiments with matched samples. I have some examples in mind but I need to work the theory better first. IT -----Original Message----- From: Isabelle Guyon [mailto:guyon at clopinet.com] Sent: Thursday, April 23, 2009 12:28 PM To: tsamard at ics.forth.gr Subject: Reference on violations of faithfulness Dear Ioannis, Thank you very much for taking part in the discussion this morning. Could you please send me a reference to the paper you mentionned on violation of faithfulness with examples in biology? Many thanks in advance, Isabelle PS: others may benefit from it, you may make a post to causality-ml at encours.org -------------- next part -------------- An HTML attachment was scrubbed... URL: