Philosophers Win Artificial Intelligence Award


The Tetrad Automated Causal Discovery Platform, a software and text project developed by Peter SpirtesClark GlymourRichard Scheines and Joe Ramsey of Carnegie Mellon University’s Department of Philosophy, earned the “Leader” Award at the 2020 World Artificial Intelligence Conference this past July.

The Leader Award is one of four awards presented at the conference that aim to recognize “the best in terms of impact and innovation in AI”. There were over 800 nominees for the awards, including projects by Amazon, Bosch, Huawei, Nvidia, Open AI Lab, and Siemens, among others.

The Tetrad Automated Causal Discovery Platform is a tool for discovering “valid, novel, and significant causal relationships” in data. A press release from CMU provides further information about the project:

The Tetrad project was started nearly 40 years ago by Glymour, then a professor of history and philosophy of science at the University of Pittsburgh and now Alumni University Professor Emeritus of Philosophy at CMU, and his doctoral students, Richard Scheines, now Bess Family Dean of the Dietrich College of Humanities and Social Sciences and a professor of philosophy at CMU, and Kevin Kelly, now professor of philosophy at CMU.

Glymour was fascinated by English psychologist Charles Spearman’s argument for a single “general intelligence,” proposed in the early 20th century, and later work by Hubert Blalock, a sociologist. Both researchers explored the possibility of distinguishing causal models by patterns of constraints they implied on the data. Glymour and his students undertook to generalize that idea, turn it into a computer algorithm and explore related mathematical properties.

The first version of the Tetrad program became the basis of Scheines’ doctoral research, which required him to learn as much computer science and statistics as philosophy, an interdisciplinary approach that was encouraged at CMU.

Peter Spirtes joined the project while studying for a master’s degree in computer science at Pitt following his doctoral work. A number of doctoral students at CMU have based their work around the Tetrad project.

Fundamental to the work was providing a set of general principles, or axioms, for deriving testable predictions from any causal structure. For example, consider the coronavirus. Exposure to the virus causes infection, which in turn causes symptoms (Exposure –> Infection –> Symptoms). Since not all exposures result in infections, and not all infections result in symptoms, these relations are probabilistic. But if we assume that exposure can only cause symptoms through infection, the testable prediction from the axiom is that Exposure and Symptoms are independent given Infection. That is, although knowing whether someone was exposed is informative about whether they will develop symptoms, once we already know whether someone is infected or not — knowing whether they were exposed adds no extra information — a claim that can be tested statistically with data.

Spirtes, Glymour and Scheines then turned this kind of reasoning on its head and extended it to massively complex causal systems. They developed algorithms that take measured data and background knowledge as input, and then compute the set of underlying causal systems that might have produced specific patterns in the measured data. What can the algorithms tell us about the causal system that underlies the measured data? According to Scheines, “not everything, but in some cases quite a lot.” Spirtes led the effort to prove that the algorithms were theoretically reliable. This approach to causal discovery constituted a breakthrough in fundamental methods in AI.

The next step was to make the work practical — which required efficient algorithms and massive amounts of simulation and real scientific testing. In the late 1990s, Joe Ramsey joined the team as a systems developer, and he has developed several important algorithms, and has made many others dramatically more efficient. With help from Spirtes, Glymour, Scheines and many others, Ramsey developed the Java-based Tetrad platform, which supports model building and testing, full simulation, and implements dozens of causal discovery algorithms that can be executed on one’s laptop or on the Bridges Pittsburgh Supercomputer.

Over the last 15 to 20 years, the free, open-source software platform has been successfully applied to scientific problems from economics to psychology to educational research to neuroscience by the original team and by researchers around the world. 

More here.

Use innovative tools to teach clear and courageous thinking
Subscribe
Notify of
guest

27 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
M. Nisa Khan
3 years ago

If you have the valid background knowledge and are writing effective and valid algorithms based on mathematical relations and axioms, you already know the causal relationships in the data you will acquire via machines. Therefore, by and large, you have a high degree of apriori knowledge of a system in terms of causal relationships. I would like to see or learn about what specifically machine learning and AI are providing us other than handling a very large set of data statistically that we may already have the knowledge of how the data would behave.

Gil
Gil
Reply to  M. Nisa Khan
3 years ago

This is a great question, as I would believe AI and ML will provide the capacity of computation not currently capable of man. The way AI/ML can handle data is more sufficient and in turn more progressive towards predictive systems. Example I would say is creating a system that can take health data and predict possible health issues from possible long term habits, or tracking the growth of a tumor or cancer. The creation of writing the algorithms is great, but for that algorithm to grow instinctively and intuitively based on the way the data influences it can cut back a lot of time on many things we can create to help our world, and that to me is amazing.

Alex
Alex
Reply to  M. Nisa Khan
3 years ago

“If you have the valid background knowledge and are writing effective and valid algorithms based on mathematical relations and axioms, you already know the causal relationships in the data you will acquire via machines.”

That’s often not the case. A common way to train AI is to just let it try something within an environment, “reward” or “punish” it depending on the result of what it tried, then repeat a bunch of times. For example, we don’t need to know the causal relationships between particular moves in a game of chess and a victory in a game of chess to be able to tell a computer “I’ll give you a 1 every time you win. Figure out how you can get as many 1s as you can.” We could be terrible at chess but get a computer to play pretty well by doing that.

Algorithms that help AI learn don’t require that prior knowledge of relationships either. They usually have very broad application. For example, you can measure how badly the chess bot lost (some ways of measuring might be more useful than others but you can still measure it) and then tell it to make adjustments to how it decides what to do in proportion to how good/bad it did relative to the desired outcome using some other algorithm that nudges the AIs decisions in some direction other than where they were when it lost. This can be extended to countless other domains and you don’t need to have prior mastery over chess or any of those other domains. There are also other machine learning methods that don’t require prior human knowledge of the causal relationships that could be discovered by a machine.

What I’m actually wondering about Tetrad is how it differs from forms of reinforcement and unsupervised machine learning that can already discover causal relationships that we weren’t aware of.

Alex
Alex
3 years ago

After looking into it more, it seems like Tetrad isn’t doing anything that can’t already be done by other programs as far as machine learning goes. Its big strength, from what I could tell, is in its aim to be more user friendly. You don’t need to be able to code to use it and its visualizations are probably easier to understand and manage than some other data visualization tools.

Adam
Adam
Reply to  Alex
3 years ago

Most machine learning aims at generating good predictions. This can often be done without an understanding of causal structure. For example, you can pretty reliably predict the divorce rate in a state from the number of Waffle Houses there. (This is an example McElreath discusses at length in his book _Statistical Rethinking_.)
But having more Waffle Houses doesn’t cause the divorce rate to be higher.

Tetrad isn’t aiming to produce good predictions; it is supposed to help us uncover causal structure. This is a completely different task, so I’m not sure why you think that “Tetrad isn’t doing anything that can’t already be done by other programs as far as machine learning goes.”

Wes
Wes
Reply to  Adam
3 years ago

You cannot generate reliably good predictions without an understanding of causal structure. Causal modeling (whether from the CMU school or the UCLA school) allows you to make a lot of assumptions explicit (and in an intuitive way) so that you aren’t just blinding looking at data and haphazardly applying this or that statistical estimation or machine learning procedure.

Even if you are just interested in figuring out what is correlated with what, you need to be aware of how different underlying causal structure can generate those correlations so you are justified in making your claims. E.g., is policy brutality incidence is correlated with race? Some who just look at data and statistics say “no” after they condition on whether you’ve been stopped by police. But that third variable that is being conditioned on is likely a collider and conditing on the collider, in this case, can generate some associations and make others disappear. But the lingo of graphical causal modeling and, in particular, “collider” is completely missing from the discussion. Consequently, it is impossible to evaluate many claims being thrown around.

Moreover, if you are interested in figuring out the expected value of Y given X, you are not making a justified inference absent a causal model. It is permissible to extrapolate outside the data set, to project to individuals (inside or outside) of the population data, or, more generally, to test subjunctives such as “if X were x, then P(Y = 1) would be p” only if you first have a causal model that graphically encodes your assumptions about how variables are causally related (okay, the graphical bit is not necessary, but it’s practically necessary given how clear and intuitive it is). You cannot get reliable counterfactual information outside of your data set from mere associations. Otherwise, all you can say (given assumptions that the data is representative, large, not given by the Trump administration, etc) is that I found a correlation in this data set and this data set is all I wish to talk about. Only die-hard empiricists make these claims.

The point is this: You need to have some understanding of causal structure to make responsible and reliable predictions. Consequently, the discovery of (or the theory-driven hypothesizing about) causal structure should not be separated from prediction. The two should go hand and hand.

Adam
Adam
Reply to  Wes
3 years ago

“You need to have some understanding of causal structure to make responsible and reliable predictions.”

I’m not sure whether we disagree (partly because I’m not sure what you mean by “responsible”, “reliable”, and “some understanding of causal structure”).
I agree that causal knowledge is fantastic if you can get it. I agree that graphical causal models are important. But causal inference is really, really hard, and (this may be a point of disagreement) it’s often possible to get by with statistical or machine learning models that are blind to causal structure, depending on the context.

Also, I shouldn’t have said that “Tetrad isn’t aiming to produce good predictions.” I don’t really know what the Tetrad folks’ goals are, so let’s pretend I didn’t say that!

Alex
Alex
Reply to  Adam
3 years ago

“I’m not sure why you think that ‘Tetrad isn’t doing anything that can’t already be done by other programs as far as machine learning goes.'”

I think that because, after looking at the program, I don’t see anything that would prevent it from uncovering “causal structure” in, for example, data about waffle houses and divorce rates. What I’m seeing is a flexible and user-friendly tool for data visualization, predictions, and tinkering with variables to see the effects.

I understand the point you’re making about the distinction between predictions and causal structure but it doesn’t seem like what’s meant by “causal structure” in Tetrad is what you mean by “causal structure.”

The “causal structures” being uncovered by the program depend on the data sets that are fed to it. An example they use is a data set with variables like temperature, ice cream eaten, and sunscreen used. As expected, higher temperatures “cause” more ice cream to be eaten and more sunscreen to be applied. There’s nothing stopping someone from renaming the variables to include waffle houses and divorce rates. If someone did that, and unless someone explicitly tells Tetrad that there’s no causal relationship, the program would produce the same results because it doesn’t care about the names of the variables, it cares about the numeric values of the variables.

If someone can point me to to a part of the program that I’m overlooking, please do. Again, I’m not trying to downplay the value of Tetrad, I can see it being very useful, but the write-up and people’s posts make it sound like Tetrad is doing something that it doesn’t actually do.

Wes
Wes
Reply to  Alex
3 years ago

Read Spirtes et al Causation, Prediction, and Search. Usually it helps to know what something is doing before using it. It’s also generally a good idea to learn about a theory before dismissing it.

Alex
Alex
Reply to  Wes
3 years ago

I’m not dismissing it, I’m trying to understand it and based on what I read on Tetrad’s website, how the program works doesn’t match what people are saying. So, thanks for the recommendation, I’ll read it.

Wes
Wes
3 years ago

My understanding of the Tetrad project is that it is by and large used to discover a set of causal structures that are consistent with the data and some set of assumptions that link the joint probability distribution over the variables to causal structures. So, Tetrad is unique in that it’s backed up by theoretical assumptions concerning the link between joint probability distributions and causal structures. As far as I know, you can also build in prior background causal assumptions, but the point of using Tetrad is to look for a set of data-consistent causal structures chiefly with some axioms about how causal structure and probabilities are related. So Tetrad is not operating on reinforcement and unsupervised machine learning principles that are “theory-free.” To get reliable predictions that aren’t by sheer accident, you need a background theory. Also, to understand what the algorithm is doing, you need background theory; not brute machine learning. I don’t know how machine learning, as is typically practiced, can reliably “already discover causal relationships” if not by sheer happy accidents. The Tetrad algorithms can and have RELIABLY discovered causal relationships in a way that is comprehensible to humans because it is principled.

Alex
Alex
Reply to  Wes
3 years ago

Wes, thank you and sorry if I came off as downplaying the value of the program. I’m much more familiar with ml in practice than theory.

“it’s backed up by theoretical assumptions concerning the link between joint probability distributions and causal structures.” That’s helpful. If I’m understanding correctly, it’s essentially a broad-purpose encoder, which could be useful in pretty much any domain.

I’m still a little confused about some other things you wrote. Would a typical instance of training a convolutional network to make predictions count as “theory-free”? If it would, there are reliable methods of training the networks that don’t rely on just hoping it accidentally finds good weights. Either way, isn’t it still discovering causal relationships? The difference would be in how those relationships are represented and how easy it is for anyone to understand the representation. In the case of Tetrad, it seems very straightforward for most people to understand, in the case of a cnn you could inspect the weights and represent those weights in different ways, making them more or less easy to understand what it is that they’re representing, but it presupposes you have certain math abilities.

Sorry, if I misunderstood anything.

Adam
Adam
Reply to  Alex
3 years ago

“Either way, isn’t [a well-trained CNN] still discovering causal relationships?”

(I accidentally posted this comment below in the wrong place. I don’t see a way to delete the other version, unfortunately. Apologies to everyone in this thread!)

No. Suppose I create a dataset of ten thousand paintings as follows. I look at each painting, and if I like it, I add a big, red check mark in MS paint. I then train a CNN to take an image of a painting as input and predict whether I liked it.

One of the things that the CNN is going to latch onto is that the presence of a big, red check mark is positively correlated with Adam liking the painting. Is this uncovering causal structure? I don’t think so. What if you’re interested in the question, “if I add a check mark to [points to painting on the wall that is not in the training data], then will that cause Adam to like it?” You might try to answer this by adding a big, red check mark to the painting and feeding that to the CNN. The CNN is going to predict that I’ll like the painting. But that’s getting the causal structure backwards. I added the check mark because I liked the painting. I didn’t like the painting because of the check.

This case is basically the same thing as the Waffle House case I mentioned above, but with CNNs.

Clark Glymour
Clark Glymour
Reply to  Wes
3 years ago

I don’t know who Wes is, but thank you for responding correctly and coherently to some of these posts.

Clark Glymour

Seymour Borgman
Seymour Borgman
Reply to  Clark Glymour
3 years ago

Congratulations on the award.

Is the CMU approach to causal modeling mathematically equivalent to the UCLA (Pearl) approach?

Wes
Wes
Reply to  Seymour Borgman
3 years ago

Yes, congratulations to Clark and everyone who contributed to Tetrad. Long time coming!

To Seymour: To my mind, the most important foundational thinkers on graphical causal modeling are the CMU group (and their students) and Pearl (and his students). But, it seems to me, that the emphasis of these two groups is different albeit complementary. The CMU group focuses on the discovery of a set of causal structures from data and some assumptions (chiefly assumptions that connect probability distribution to causal structure). The Pearl group focuses on identifying causal parameters, given a set of hypotheses about the causal structure, before willy-nilly statistical estimation. Both make fundamental use of graphical criteria such as d-separation.

Also in another post, you write: “Glymour about 10 years ago wrote a manifesto that advocated de-funding the 99 percent of academic philosophy that could not be sensibly housed in a computer science department. Funny to see him and group of philosophers-turned-statisticians honored here.”

This fundamentally misunderstands Glymour and is not at all charitable. Calling for a kind of philosophy with its roots in Euclid and proof is not advocacy for defunding philosophy. In addition, they certainly are not philosophers-turned-statisticians. This misunderstands either statistics or their work.

Seymour Borgman
Seymour Borgman
Reply to  Wes
3 years ago

If I understand you correctly, the two schools use the same or interchangeable mathematical formalism but address different questions. Pearl showed that his formalism is equivalent to, or generalizes, some others in statistics, economics, epidemiology, etc. So it’s good to hear that the CMU stuff is fully compatible.

Glymour’s “manifesto” is archived at

https://web.archive.org/web/20120108061810/http://choiceandinference.com/2011/12/23/in-light-of-some-recent-discussion-over-at-new-apps-i-bring-you-clark-glymours-manifesto/

I don’t think it’s a mischaracterization to say he views most of traditional, and current academic, philosophy as somewhere between useless and a racket in need of external correction —- and he has stated similar views in other interviews and talks that can be found online. His views are pretty much identical to what you can hear in any STEM department, the only difference is that he is nominally within the field.

“Calling for a kind of philosophy with its roots in Euclid and proof is not advocacy for defunding philosophy. In addition, they certainly are not philosophers-turned-statisticians. This misunderstands either statistics or their work.”

That isn’t actually what Glymour was saying. He wasn’t merely calling for formal philosophy, but a kind that engages with mathematics (and science, statistics, engineering, CS) using the formal tools typical of those fields, not “formal” as used by philosophers. That means (inter alia) quantitative, numerical or probabilistic/statistical sorts of “formalism”, rather than logic and its relatives that he specifically criticizes (at least the style done by philosophers).

In interviews, Glymour says that were he a student today pursuing his interests from the time of his PhD he would do machine learning (a.k.a “statistics”). Similarly, Spirtes says he wasted five years getting a PhD in philosophy. Their papers that use math can be read knowing statistics, ML or causal DAG stuff and zero philosophy, and they frequently publish in ML/statistics/CS journals. Seems pretty ML-ish to me!

Adam
Adam
3 years ago

“Either way, isn’t [a well-trained CNN] still discovering causal relationships?”

No. Suppose I create a dataset of ten thousand paintings as follows. I look at each painting, and if I like it, I add a big, red check mark in MS paint. I then train a CNN to take an image of a painting as input and predict whether I liked it.

One of the things that the CNN is going to latch onto is that the presence of a big, red check mark is positively correlated with Adam liking the painting. Is this uncovering causal structure? I don’t think so. What if you’re interested in the question, “if I add a check mark to [points to painting on the wall that is not in the training data], then will that cause Adam to like it?” You might try to answer this by adding a big, red check mark to the painting and feeding that to the CNN. The CNN is going to predict that I’ll like the painting. But that’s getting the causal structure backwards. I added the check mark because I liked the painting. I didn’t like the painting because of the check.

This case is basically the same thing as the Waffle House case I mentioned above, but with CNNs.

Alex
Alex
Reply to  Adam
3 years ago

“Is this uncovering causal structure? I don’t think so.”

I understand the difference between correlation and causation but, again, the program doesn’t seem to be able to account for what you’re talking about. As they say on their website, people mean a lot of different things by causal, and they go on to give the ice cream example I mentioned.

I’m going to look at the book Wes recommended to see if it makes better sense to me.

Adam
Adam
Reply to  Alex
3 years ago

Sorry to beat a dead horse!  I just noticed that the Tetrad folks have videos on Youtube of a short course on their software (plus background on graphical causal models).  You may find it interesting.  Here’s a clip from the first video in which Spirtes talks a little bit about causal models vs. ML/statistical models here: https://youtu.be/9yEYZURoE3Y?t=1750

Adam
Adam
Reply to  Alex
3 years ago

Scheines, not Spirtes.  Apologies!

Alex
Alex
3 years ago

So, I decided to run some experiments with Tetrad.

I made a data set including every US state except about 5 that didn’t have complete data available. It included number of waffle houses, divorce rate, average annual temperature, rank by percentage of high school graduates, and the number of times I’ve visited the state.

I fed the dataset with several different parameters that struck me as reasonable and it didn’t make a difference to the causal graph that was output, which indicated the following:

– Where A and B are Waffle Houses and temp, exactly one of the following holds: (a) A is a cause of B, or (b) B is a cause of A, or (c) there is an unmeasured variable that is a cause of A and B, or (d) both a and c, or (e) both b and c.

– Where A and B are temp and HS rank, exactly one of the following holds: (a) A is a cause of B, or (b) B is a cause of A, or (c) there is an unmeasured variable that is a cause of A and B, or (d) both a and c, or (e) both b and c.

– Where A and B are HS rank and visits, either A is a cause of B, or there is an unmeasured variable that is a cause of A and B, or both.

– Where A and B are divorce rate and visits, either A is a cause of B, or there is an unmeasured variable that is a cause of A and B, or both.

The last two seemed right to me in that I could easily think of some unmeasured variables that could explain the relationship. The first two seemed strange. I’ll probably continue playing with this as it is really useful and visualizes the relationships better than other options. My main problem is that people seem to think this is discovering causal structures beyond high correlation when the strongest relation that Tetrad actually claims acknowledges that ” there may be an unmeasured confounder of A and B.”

Adam, I just noticed that earlier today too! Thanks, though. I’m definitely going to check it out.

David Duffy
David Duffy
Reply to  Alex
3 years ago

I might point to an old paper of ours which was a practical test of related methods that infer causation –
https://genepi.qimr.edu.au/contents/publications/staff/CV143.pdf
The point of that paper is that one can make correct inferences if the relative strengths of association between the different variables are favourable (and measurement error is known!), otherwise one cannot exclude a more distant unobserved cause. In the genetic models we were testing, the direction of some of the arrows is fixed by previous knowledge (eg genotype causes phenotype), and it is the propagation of that information through the model that allows inference about other variables. Mendelian Randomization is another nonexperimental method (an instrumental variables approach) that leverages genetic information to make causative inferences about pairs of phenotypes that is currently extremely popular among epidemiologists.

Alex
Alex
Reply to  David Duffy
3 years ago

Thanks, that’s really interesting. I’ve been doing tests with weird data sets like I mentioned above but I’ve had thoughts about how it could be used to make training game AI more efficient using previous knowledge. Do you know if the algorithms are meant to work well if arrays are given as the value of a variable?

David Duffy
David Duffy
Reply to  Alex
3 years ago

“arrays are given as the value of a variable?”. No, you’ll have to split those into multiple relevant variables – maybe higher dimensional summaries like principal components (remembering this is what the NN layers do “for free”). All this assume the variables are unitary in nature (like unobserved component parts don’t have grotesquely different relations with your putative causes), and causative relations are linear (or polynomial etc)

David B Kinney
Reply to  Alex
3 years ago

I’d like to hear more about why the first two outputs seem strange?

Waffle house is a chain headquartered in Georgia that has more locations in the South, so it seems to me that there’s a fairly straightforward common factor that could explain the relationship between number of Waffle Houses and temperature; namely, being in the Southern part of the country.

Rates of HS graduates and temps are a little trickier, but there too a common cause story isn’t that crazy; we know that some southeastern states in the US are poorer than the country as a whole, and its not too big a leap to think that this affects HS graduation rates in some way.

If the correlation goes the other way in either case, such that high temps are associated with fewer waffle houses, or high temps are correlated with high graduation rates, then I’d be a bit stumped.

Seymour Borgman
Seymour Borgman
3 years ago

Glymour about 10 years ago wrote a manifesto that advocated de-funding the 99 percent of academic philosophy that could not be sensibly housed in a computer science department. Funny to see him and group of philosophers-turned-statisticians honored here.