Machine Learning, Causal Model Search, and Educational Data

Richard Scheines1

Carnegie Mellon University

Many of the questions in educational research are causal. In observational studies involving cognitive tutors or online courses, we often want to know not just whether certain kinds of student behaviors (hint requests, worked examples seen, etc) are associated with learning outcomes, but whether they cause them. In experimental studies, where treatment is randomized and we have good statistical evidence on whether treatment has an overall on learning, we often want to know about the mechanisms by which treatment might influence learning. In these cases and others, machine learning can help, and has already begun to do so.

In the last two decades, enormous progress has been made in formalizing statistical causal models (Pearl, 2000) and on applying machine learning to search for causal models (Spirtes, Glymour and Scheines, 2000). This technology has begun to be used on educational data, but its potential has barely been scratched. In this workshop, I would like to present an overview of the work in machine learning for causal models of educational data that has been going on over the last decade, and make the case for the great potential that I think exists for this technology going forward.

Qualitative causal structure over a given set of variables can be represented by a directed graph (a causal graph), and a quantitative causal model can be specified by parameterizing the conditional distributions of each variable on its immediate parents in the causal graph. The problem, in a nutshell, is that the number of causal graphs grows super-exponentially with the number of variables. Even if researchers have substantial domain or theoretical knowledge to limit the number of possibilities, it is still typically far too big a space to search by hand. Machine learning researchers who have focused on causal model search have managed to rigorously characterize several different equivalence classes for causal graphs and develop provably reliable, and in several cases extremely efficient search algorithms over the given equivalence class space. 2

Machine learning searches for causal models have been successfully used in genetics,3 biology,4 fMRI-based cognitive neuroscience,5 climate research,6 public health,7

1Primary Appointment: Department of Philosophy, Secondary Appointments: Machine Learning Department, Human-Computer Interaction Institute

2See for example, Spirtes, et al, 2000; Silva, Scheines, Glymour, and Spirtes, 2006;

3Chen, Emmert-Streib, and Storey, 2007.

4Shipley, 2000.

5Ramsey, et al, 2010.

6Chu and Glymour, 2008.


sociology,8 education research,9 and many other disciplines. In education research, Scheines, Leinhardt, Cho, and Smith (2005) analyzed log data from an online course and found evidence that printing requests inhibited voluntary interactive comprehension checks, which in turn positively influenced learning outcomes. In a follow up study in which they intervened to break the printing comprehension check, the results were confirmed and learning improved. In a 2007 paper, Laski and Siegler use causal model search to examine the mechanisms by which students learn numerical magnitude. In 2008, Shih, Koedinger, and Scheines found that the time students spent in reading or reacting to “bottom out hints” in a geometry tutor indicated whether the student was treating the hint as a way to avoid thinking (gaming) or as a worked example. Frequent gaming led to poor learning outcomes while frequent use of hints as worked examples led to good learning outcomes. In 2010, Shih, Koedinger, and Scheines used machine learning to construct Hidden Markov models of student strategies in hint use. In a study on a computerized fractions tutor for elementary students, Rau and Scheines (2012) used causal models to examine the mechanisms by which multiple representations of a fraction might improve post-test performance as well as retention. Clearly this list is not exhaustive,10 but it hopefully gives a flavor of the work in machine learning and causal modeling for education.

If educational researchers have causal questions and data, then machine learning for causal structure can likely be scientifically useful. Software is freely available in a number of forms,11 and research in the methodology has exploded over the last decade.


Arnold, A., Beck, J., and Scheines, R. (2006). "Feature Discovery in the Context of Educational Data Mining: An Inductive Approach." Proceedings of the AAAI 2006 Workshop on Educational Data Mining, Boston, MA.

Chen, L., Emmert-Streib, F., and Storey, J. (2007). Harnessing Naturally Randomized Transcription to Infer Regulatory Relationships Among Genes, Genome Biology, 8: R219, October.

Chu, T., & Glymour, C. (2008). Search for Additive Nonlinear Time Series Causal Models. Journal of Machine Learning Research, 9(May):967--991, 2008.

7Scheines, 2000.

8Jackson and Scheines, 2005.

9Scheines, Leinhardt, Cho, and Smith, 2005.

10Other researchers have begun to use causal modeling on educational data. For example, Joe Beck (at WPI), who has pioneered machine learning for education, has explored causal modeling and search (Dai an Beck, 2011).

11See, for example, Tetrad: .


Rai, D., & Beck, J., (2011). Causal Modeling of User Data from a Math Learning Environment with Game-Like Elements. AIED 2011: 528-530.

Jackson, A., and Scheines, R. (2005). “Single Mothers’ Self-Efficacy, Parenting in the Home Environment, and Children’s Development in a Two-Wave Study” in Social Work Research, 29, 1, pp. 7-20.

Laski, E. V., & Siegler, R. S. (2007). Is 27 a big number? Correlational and causal connections among numerical categorization, number line estimation, and numerical magnitude comparison. Child Development, 78, 1723-1743.

Pearl, J. (2000). Causation: Models of Reasoning and Inference, Cambridge University Press.

Ramsey, J., Hanson, S., Hanson, C., Halcheno, Y., Poldrack, R., Glymour C. (2010). Six problems for causal inference from fMRI. NeuroImage, 49, 1545-1558.

Rau, M., and Scheines, R. (forthcoming). Searching for Variables and Models to Investigate Mediators of Learning from Multiple Representations, in Proceedings of the 5th International Conference on Educational Data Mining (EDM 2012)

Scheines, R., (2002), Estimating Latent Causal Influences: TETRAD III Variables Selection and Bayesian Parameter Estimation: Lead and IQ” Handbook of Data Mining and Knowledge Discovery, Pat Hayes, editor, Oxford University Press, 944- 952.

Scheines, R., Leinhardt, G., Smith, J., and Cho, K. (2005) "Replacing Lecture with Web- Based Course Materials, Journal of Educational Computing Research, 32, 1, 1-26.

Shih, B., Kenneth R. Koedinger, and Richard Scheines. (2010). ``Discovery of Learning Tactics using Hidden Markov Model Clustering.'' in Proceedings of the 3rd International Conference on Educational Data Mining

Shih, B., Koedinger, K., & Scheines, R. (2008). A Response Time Model for Bottom- Out Hints as Worked Examples. Proceedings of the First Educational Data Mining Conference. (Best Paper Award).

Shipley, W. (2000). Cause and Correlation in Biology: A User's Guide to Path Analysis, Structural Equations, and Causal Inference, Cambridge.

Silva, R., Scheines, R. Glymour, C., and Spirtes, P. (2006) “Learning the Structure of Linear Latent Structure Models,” Journal of Machine Learning Research, 7, 191-246.

Spirtes, P., Glymour, C. and Scheines, R. (2000), Causation, Prediction, and Search 2nd edition, MIT Press, Boston.