(Published as: De Jaegher, H. (2010). Enaction versus representation: an opinion piece. In T. Fuchs, H. Sattel & P. Henningsen (Eds.), The Embodied Self: Dimensions, Coherence and Disorders. Stuttgart: Schattauer.)

“The existence of other people is a
difficulty and an outrage for objective thought”
(Merleau-Ponty, Phenomenology of Perception, p. 349)

Current science of intersubjectivity (meaningful engagement between subjects) overwhelmingly focuses on a single problem. The dominant question in the field is: how do we figure out what intentions, emotions and feelings underlie the other’s behaviour, so that we might explain and predict that behaviour, so that we can further deal with him?

This view on intersubjectivity is due to the old and oddly seductive metaphor of the mind that works like a computer. On this view, on encountering another, his behaviour enters us in the form of sensations and perceptions, which are fed into an internal mechanism for inferencing or simulating, which then returns an action plan to execute in regard to the other person. We are then able to say something to him, gesture towards him, avoid him or do any other action that has been suggested as meaningful and correct in the situation. We need inferences or simulations, on this view, because we do not perceive the other’s intentional or mental states.

This perspective on how things work – even if sketched caricaturally here – still dominates how we think about intersubjectivity in the cognitive sciences today. The metaphor at play is powerful, easy to imagine, and not yet replaced by an equally tangible alternative. This is why, nowadays, we are still faced with a mostly single problem-science of intersubjectivity. Or at least with a holy grail-like entity that all self-respecting accounts of intersubjectivity must ultimately attain: a Grasp of Metarepresentation, that theoretical entity that allows us to manage the imperfect perceptual information and transform it into knowledge about the other’s intentions. We have reduced our rich intersubjectivity to a problem of internal reckoning and now we seem stuck with it.[1]

Not everyone thinks that meta-representation is the only thing that will help us reach an understanding of intersubjectivity. But the dedication to it (within for instance developmental psychology) does have a deafening effect when it comes to alternative interpretations and critical testing of the assumptions that underlie the dominant paradigm.[2]

Let’s analyse the internal logic of the problem and the proposed representational solution. The assumptions of the classical paradigm are that intentions are static and hidden inside a person. If our intentions are sitting somewhere inside us, all we need to do in order to understand each other is transport them: the intention in my head somehow needs to get into yours. If we can manage that, you will have understood me. But if intentions are hidden and we only have external behaviour to go on, then can we manage this? Except by building a literal, material bridge between your brain and mine (connect them with something like a usb cable?), transporting intentions from one brain to another is not – literally, materially – possible. This is why we need inferences or simulations according to the traditional approach: to supplement our imperfect observational data. All we can do in order to know what goes on with another is to observe what they do, and then take these observations inside ourselves and work out what the other’s outward moves may mean by using some processing mechanism. We rely on our own inner workings in order to know what someone’s behaviour means, to complement our detached observations, which, ex hypothesi, give us only an incomplete grasp of the other.

It is the combination of these two assumptions – the hidden and static nature of the other’s intentions, and the idea that our only access to them is through detached observation – that demand a representationalist approach. However, are these assumptions sufficiently universal or are they rather exceptions to our experience of social engagements (Gallagher 2001; Hutto 2004)? If they are indeed exceptions, then one might question the need for representational explanations as a general framework. Basing a general explanation of social cognition on a more universal aspect of social situation could be more fulfilling.

An enactive approach to social cognition starts from just such a principled starting point: the interpersonal interaction process. The definition of social interaction that I gave with Di Paolo in 2007 states that a situation can be called social when 2 conditions are met. First, there has to be a mutually regulated coupling between two (or more) agents, where this regulation is directed at the coupling itself. Second, the agents involved in this mutual coupling must not lose their own individual autonomy in the process (De Jaegher and Di Paolo 2007, p. 493). If one of the agents involved did lose their autonomy, we would not be in the presence of a social interaction, but of one that resembles a coupling between an agent and a tool or instrument. If neither of the agents in the process had autonomy, we would not be dealing with a social interaction, but with a mere coupling (which can be established by external factors). The point here is not to say that the social domain is purely self-reflexive. Of course, social interactions are open to external influences, but they also have a self-regulative aspect.

After reframing the social in this way, we can ask another question: Where does social meaning come from? The standard answer to this question does not go the bottom of the issue. Meaning is generally assumed to be somehow present and merely needing to be transported from one place to another. In contrast with this, the non-representational approach links meaning to animation and movement. Movement here is understood as more than mere locomotion or change of position. It is intimately related to the structure of a creature, to what it can and cannot do (Sheets-Johnstone 1999). In movement, we constitute and enact our world. This, for enactivists, is cognition understood as sense-making (Varela 1997; Di Paolo 2005; Thompson & Stapleton 2009). Sense-making is the ability to interact with the world in a normative way based on the connection between our interactions and the sustaining of our identity as cognitive agents, i.e., to distinguish the things that are meaningful in our engagements. If we combine this idea of how agents make-sense of their world with evidence that people in social interaction can coordinate their movements with each other (Condon & Ogston 1971; Kendon 1990; Schmidt, Carello & Turvey 1990), we can conclude that people can participate in each other’s sensemaking activities (De Jaegher & Di Paolo 2007; De Jaegher 2009). Thus, social cognition is the active engagement with the social environment, and more specifically the interactive, interpersonal generation and transformation of meanings. Participation, furthermore, happens on a range of mutuality: from guidance or orientation (where one’s sense-making is directed, much like attention can be directed, at the mutual end), to joint participation (where, for instance, ideas, thoughts or opinions arise in an interaction that cannot be attributed to either one of the participants, and which can thus be said to be truly co-created).

In this way, the social is the constitution of interpersonal understanding. In other words: The social makes social understanding happen. This is probably counterintuitive to those whose thinking is steeped in the standard approach where, generally, individual cognitions simply put together already make up the social. Notice that, in contrast to assuming without justification the universality of a rather special skill – that of observing in a detached manner – I am deducing, from the definition of the social, a general consequence about cognition in the social domain, namely that there is always an active element of co-regulation and co-participation present, which of course can involve different forms of engagements (of which detached observation is a derived case). A solution to how we understand others that emerges from this conclusion is not tied to a particular form of social engagement, but to its general characteristics. An empirical advantage of this proposal is that it is testable without a need for postulating internal black boxes – but rather by externally observable processes, using for instance the dynamical systems tool of coordination.[3]

In sum, the alternative solution states that intentions are not fixed and locked inside persons, but, at least in the social domain, are open and susceptible to change in interactional, participative situations. Further, it assumes that observing is nothing more than a special, derived kind of social engagement, and certainly not the paradigm case of social understanding.

Non-representationalist accounts avoid being seduced into seeking answers to the problems conjured up by representationalism. Instead, they dissolve them in favour of new conceptions of the issue at hand. By starting from a different departure point, they take the assumptions of representationalism not as the rule, but the exception.

It needs to be clarified that enaction does not deny the existence of inferential skills, or more generally, thinking. What enaction denies is that the only way in which inferential skills or thinking can be explained is through representational mechanisms (agents’ brains playing around with concepts like ‘belief’ and ‘pretend’).

Once taken to be real (and there is overwhelming empirical evidence and phenomenological justification to suggest that they are), we have to account for where inferential skills come from. The enactive approach to social cognition proposes that a good place to start is social interaction and participation in sensemaking, as outlined above and in other places (De Jaegher & Di Paolo 2007, 2008; De Jaegher 2009; Gallagher & Hutto 2008; Froese & Di Paolo 2009; Gallagher 2009; McGann & De Jaegher 2009; Fuchs & De Jaegher 2009; Fuchs & De Jaegher, this volume). This does not imply that inferential skills or thinking could only be employed in social interactional situations. Once their development is underway, they can be used outside such situations too. It is just that understanding them as developing in social situations, rather than underlying social situations, seems a more plausible explanation.

The point is not that we do not think. It is to ask where thinking itself comes from. Hernik (this volume) asks: if adults can think, then how do we trace this capacity back to infants? On an enactive account, the question would rather be “What do infants do, so that they grow up to be able to think?” Thinking is a general cognitive ability, but rather than suggest that applying this capacity in social situations is what social understanding consists in, we propose that thinking emerges out of social situations, and can, from there, be applied also in non-social (less social) domains. The aim is to explain thinking as developing from social interactions (cf. Hobson 2002).

Still other skills, like deferred imitation in young infants, seem to be taken as evidence of internal representations. But why? Why would some skills require mental representations, while some do not? It is perfectly possible to account for how we ride a bicycle, even play chess, without postulating internal representations (Dreyfus 2002). Is it needed then in order to explain neonate imitation, even deferred imitation or completion of another’s goal-driven actions? Yes indeed, these kinds of actions require memory, but perhaps memory, rather than a storage device, can be conceived of as a skill itself, a certain activity in interaction with the world (Fuchs 2008, p. 63, 153ff., 188ff.). After all, in the case of deferred imitation, why does the infant do precisely this gesture in the situation, rather than any other action it could have remembered from the previous interaction? Representationalist approaches have answered the question of how deferred imitation is possible as little as enactive ones do, but at least enactive approaches try to avoid the black-box excuse for an answer.

On a related issue, it is assumed that the representational account does not need to explain skills of representing, whereas enactive accounts do. How do we think? How do we represent? the representationalist asks the enactivist. Yes, the enactivist admits, we have work to do here. But so do representationalists. The latter suffer a classical case of confusion between the explanans and the explanandum: Representing as a skill cannot to be grounded if you assume representations as building blocks. An enactive story of thinking (e.g. reasoning, inferencing) is likely to be complicated. But the representationalist story is bound to be incomplete, since what needs explanation is already assumed.

Finally, I would like to briefly address one more point. One of the most striking examples of the idea that the social is made up of two individual cognitions put together is Watson and Gergely’s proposal of a social contingency detection module (Watson 1979; Gergely & Watson 1996, 1999; Gergely, Koós, Watson this volume). The idea that infants deal with the social through the use of a contingency detection module presumes that infants are merely confronted with the contingent stimuli, which they then have to decode. But Gergely and his colleagues are dealing with a wonderfully interesting interactional phenomenon. Regrettably, they dissect and break it down into individual parts – the least informative ones. The contingency of behaviour is a relational and participative phenomenon par excellence, which is here reduced to stimuli and decoders. The decoders are hypothesized, and the only evidence we have for them is indirect. If we see this process instead as one that spans individuals, the operative factors would be tractable – scientifically and mathematically so. This is being done in dynamical systems research on this kind of phenomenon, where it is found that people can coordinate their movements when in interaction, even involuntarily (see for instance Schmidt & O’Brien 1997; Richardson et al. 2007; Di Paolo, Rohde, Iizuka 2008). In an enactive account, the dyadic pattern of activity is not a new source of input for perception, but it is itself part of perception and sense-making, and perception and sense-making are part of it (McGann & De Jaegher 2009).

To sum up: The representational stance is based on dubious assumptions and ungrounded in a theory. The enactive approach offers a theoretical framework starting from the question of what counts as social. In this, detached observation is recognised for the limiting case that it is. The solution to the problem of how we understand others is based on the more typical situation of interaction and participation. On such a view, representations are not needed. Rather, intentions form and transform in interaction, and internally experienced thinking develops through participation in many different sorts of interactional situations in which meaning is shaped and reshaped with others.


[1] In itself there is nothing wrong with wanting to understand meta-representation. Certainly, in narrow and specialized contexts, we may engage in thinking about another’s state of mind. But there is no reason to assume that this is the lone summit of our entire range of social capacities, and no reason to presuppose that our capacity to do this in some way underlies our other ordinary world-engaging talents. Other stories are possible. Also, currently we have no way of making sense of what really goes on in the head, and thus it is unprovable that all this work in social situations is done by representations (let alone meta-representations). Thus, one of the programmatic points of the research I defend is to offer and investigate the possibility of another way of seeing and studying what goes on – a way that does not rely on representation, which is a particularly ill-defined term (Harvey 2008).

This position is not a merely negative stance anymore, it is becoming a positive, constructive working-paradigm, with theoretical proposals (Gallagher & Hutto 2008; De Jaegher & Di Paolo 2007) and the generation and investigation of empirical hypotheses in an increasing number of fields (e.g. Reddy 2001; Auvray, Lenay & Stewart 2009; Froese & Di Paolo forthcoming).

[2] Think also of the great enthusiasm with which neuroscience has jumped on the bandwagon in order to try and find the neural loci of theory-of-mind-modules or similar proposals, without so much as a sigh of effort in trying to test the assumption that such things exist überhaupt. But maybe most of contemporary neuroscience isn’t the best place to go and tamper with assumptions in this way.

[3] With this, I do not mean to imply that this is an externalist notion of cognition. The external-internal distinction is misleading because it is tied to the idea that cognition must be boxed. Here, I am talking only about how we can scientifically observe the measurable aspects of sense-making, which is not either an internal process or an external one, but rather a subject’s engagement with the world (see also Di Paolo 2009).


