By matching individual movements, mirror processing provides a representation of body part movements that might serve various functions (for example, imitation), but is devoid of any specific cognitive importance per se. By contrast, through matching the goal of the observed motor act with a motor act that has the same goal, the observer is able to understand what the agent is doing. – Rizzolatti & Sinigaglia, 2010, Nature Rev. Neurosci., p. 269
(The shift in emphasis from movements to goals is actually problematic for the theory because one could argue that the goals of an action are sensory, e.g., in the case of speech the goal is to produce a sound. Therefore understanding is a sensory phenomenon. But that is not what I want to talk about here.)
They are also careful to point out that action recognition can be achieved using non-motoric means.
...the recognition of the motor behaviour of others can rely on the mere processing of its visual aspects. This processing is similar to that performed by the ‘ventral stream’ areas for the recognition of inanimate objects. It allows the labelling of the observed behaviour, but does not provide the observer with cues that are necessary for a real understanding of the conveyed message. – Rizzolatti & Sinigaglia, 2010, Nature Rev. Neurosci., p. 270
This "real understanding" comes "from the inside" as they say. But what does this mean? One interpretation, is that knowing how to perform an action allows one to predict the outcome of an observed action ahead of time, i.e., "I know what you are doing". As Marco Iacoboni writes in the context of watching sports,
We understand the players’ actions because we have a template in our brains for that action, a template based on our own movements. -Iacoboni, 2008, Mirroring People, p. 5
I think this is true to some extent. Experience performing an action can allow us to predict the consequences of that action when we observe someone else performing it. It is important to recognize, however, that this predictive coding isn't unique to the motor system. We can learn to predict the consequences of actions that we have never performed just by observing the action performed repeatedly.
For example, my dog is really good at this. He loves to play fetch, and after many exposures to viewing my throwing action, he can predict the direction of the ball's flight. Below is a video demonstrating this. There are 6 successive trials. Each starts with me holding the ball and my dog watching me intently. I then turn in some direction and make a throwing action in that direction. I don't actually release the ball to ensure that he is cueing off my actions and isn't just following the trajectory of the ball. As is clear from the video, he recognizes that I am making a throwing action and immediately responds by turning to run after it. Further, the direction in which he runs shows that he is correctly predicting where I planned to throw the ball. (Once he's on his way, I throw the ball so his chasing response isn't extinguished.) I should say that I don't normally throw the ball for him in random directions. It is usually in one general direction, yet he had not trouble with this task, even without any practice.
You can see in the last trial that he takes off running even before the forward motion of my arm. Similar experiments have been carried out in monkeys observing overhand throwing actions by a human. It was argued that the monkeys understood the action even though they can't make overhand throwing actions themselves. The dog example is even stronger though because they can't grasp or throw at all yet seem to understand throwing actions quite well.
This behavior is no surprise to owners of dogs who like to fetch, but it does illustrate the simple fact that predictive knowledge of "intention" does not have to come from the motor system. In fact, given that there is survival value in predicting the actions of prey or predators, some of which differ in terms of their motor repertoire compared to the observing animal, one could argue that prediction based on sensory learning is even more important than motor-based predictions when it comes to others' behaviors.
Rizzolatti, G., & Sinigaglia, C. (2010). The functional role of the parieto-frontal mirror circuit: interpretations and misinterpretations Nature Reviews Neuroscience, 11 (4), 264-274 DOI: 10.1038/nrn2805