What is the difference between “contravariant” and “covariant” tensors, and why do they transform differently under a change of coordinates? When first learning this material I could apply the formulae but was very confused by what the concepts intuitively meant, and could not discern the difference between the esoteric paths termed “covariant” and “contravariant”.
The purpose of this article is pedagogical, and students often find the concept “tensor” itself confusing and intimidating, so we clarify it first. I assume the reader already has basic familiarity with tensors, so examples of what is and is not a tensor, to aid conceptual clarity.
- A tensor is a geometric object. This is a more physical requirement/interpretation.
- For example the temperature at each location in a room is a scalar (field), a specific case of tensor. The temperature at a point is independent of the coordinates used.
- A non-example is taking the value of the 1st coordinate say, at each location, in arbitrary coordinates. This gives a number, but not a scalar. It has nothing to do with physical reality. (See Norton, “Did Einstein Stumble?” §4.2 for vaguely-related historical discussion.)
- Vectors are tensors. They are geometric objects, and transform as expected. As Zee (2013, §1.4) said, tongue-in-cheek, “A tensor is something that transforms as a tensor.”
- Christoffel symbols are not tensors. These do have a transformation law under a change of coordinates, but for one thing the transformation is not linear:
- Tensor components transform linearly under a change of coordinates.
- Tensors are linear. The mathematical definition is: a linear functional from vectors (in the tangent space at a point) to reals: , or on the dual space: , or multiple copies of these.
- Dual vectors are tensors. A familiar example of a dual vector is a gradient. Imagine the temperature distribution mentioned above (Collier §18.104.22.168). Given its (spatial) gradient, supply a vector and it tells you the total temperature change in the direction of that vector. In other words, gradient sends vectors to real numbers. But that’s what a dual vector does!
- Pseudotensors are not tensors. Like all our nonexamples, they (inescapably) rely on the choice of coordinate system. Some are tensors up to an orientation determined by coordinates. More generally, physical quantities have been derived from them, e.g. Einstein used a pseudotensor for the energy of the gravitational field to make successful predictions about gravitational waves, however this was seen as a bit dubious.
Incidentally the terminology “contravariant”‘, “covariant”, and “mixed” for tensors is from Ricci. The term “tensor” is from Einstein and Grossmann (Rosenfeld 1988, §8).
Briefly, a “contravariant” tensor transforms oppositely to a “covariant” tensor, meaning their components transform inversely under a change of basis. A “contravariant” tensor has components written with raised indices, for example a 4-velocity , whereas the components of a “covariant” tensor are written with lowered indices, for example a metric . There are also tensors with “mixed” indices, for example the Riemann tensor is often given as . We will get to more precise definitions soon. But for now, note this “contravariant” and “covariant” terminology is old-fashioned (Schutz §3.3 p.60). It is better to refer to “raised” and “lowered” indices (or “upstairs” and “downstairs” as in Misner, Thorne & Wheeler 1973, §3.2) or more generally “a tensor of type (1,3)” in the example of the Riemann tensor above. From now on, I assume a single index only. In this case, a tensor with single raised index is termed a vector, and lowered index a dual vector, 1-form, or possibly covector.
Suppose we wish to transform from coordinates to . Recall that if a vector ‘a’ has components in the original coordinates, then its components in the new coordinates are . On the other hand, suppose a dual vector ‘b’ has components in the original coordinates, then its transformed components are . (The Einstein summation convention is assumed throughout.) From the first formula, the vector components undergo a linear transformation at each point, given by the matrix of partial derivatives . Conversely, dual vector components are transformed by (I have relabeled the indices). Recall these are called Jacobian matrices and are inverses of one another, hence we say the components transform inversely. (I should add an example.)
We can raise and lower indices by and . This was the source of my confusion, because it seemed that one object could transform in two different ways, depending on whether it was written or . It seemed “covariant” and “contravariant” transformations were very different concepts. But the resolution is simply that and correspond to different objects, a dual vector and vector respectively. Yes, confusingly, both a vector and its dual would usually be written as ‘a’ in index-free notation. This is because there is a natural correspondence between them, given a metric (see the raising and lowering formulae above). The notation is convenient, but you just have to avoid the misconception. In index-free notation, Schutz (§3.3) chooses a notation for a dual vector, and for a vector, however this is not a standard convention, and in relativity the “→” might imply a 3-dimensional spatial vector specifically. One option is to simply write in words “...the dual vector a...”.
Given that a tensor is a vector or dual vector, the transformation law is determined. Components of vectors transform inversely to components of dual vectors. (Also, just to add more confusion, components of vectors transform inversely to basis vectors. And components of dual vectors transform inversely to dual basis vectors.)
This inverse transformation gives rise to the word ‘dual’ in ‘dual vector space’. The property of transforming with basis vectors gives rise to the co in ‘covariant vector’ and its shorter form ‘covector’. Since components of ordinary vectors transform oppositely to basis vectors… they are often called ‘contravariant’ vectors. Most of these names are old-fashioned; ‘vectors’ and ‘dual vectors’ or ‘one-forms’ are the modern names. The reason that ‘co’ and ‘contra’ have been abandoned is that they mix up two very different things: the transformation of a basis is the expression of new vectors in terms of old ones; the transformation of components is the expression of the same object in terms of the new basis. (Schutz §3.3)
Some mathematical definitions are overdue [although these really belong elsewhere, and are only an abbreviated summary, not an introduction]. Recall that in curved geometry a vector exists only in the tangent space at a given point, and does not stretch from one point of the manifold to another as often conceived in flat space. So a vector is an element of the tangent space. A dual vector maps vectors to the real numbers , in other words, it is a linear functional on the tangent space. Alternatively, we can think of a vector as mapping dual vectors to (see Schutz §3.6 on “Circular reasoning?”). Recall that for a given coordinate system , the coordinate basis vectors are denoted , and the dual coordinate basis vectors , with the property .
One final clarification, the modern consensus seems to be that the term “covariant” has the broader meaning of any quantity which transforms as a tensor, meaning it transforms appropriately with a coordinate change. Hence we would say both vectors and dual vectors, as well as all other tensors, are covariant! Like anything, the meaning should be clear from the context.
- In place of “contravariant”, say raised indices, a tensor of type , or a vector (in the case of a one-dimensional tensor). In place of “covariant”, say lowered indices, a tensor of type , or a dual vector / covector / 1-form.
- The raising or lowering of indices, for instance , corresponds to different tensors. They are usually written with the same variable (here ‘a’) because the metric determines a natural correspondence between them.
- There is only one type of transformation, given a specific type of tensor. (Yes, maybe one could count a transformation and its inverse as two types… but I express it this way to combat my earlier misconception, see above).
- Use “covariant” to express that a quantity is a tensor, meaning it transforms appropriately under a change of coordinates. This is independent of whether its indices are raised, lowered, or mixed.
- Usage is different, particularly in older sources, but the meaning should be clear from the context.
Update: From a further survey of textbooks, it is evident the terminology Schutz criticises is very widespread. For instance, a leading mathematical reference on differential geometry (O’Neill 1983, p.37) does use this language, though more recent books are worth checking.
Addendum on raising and lowering indices: when the tensors are not symmetric you need to be careful of index positions. For example given a tensor , lowering either the first or second indices gives different (1,2)-tensors and (Schutz §3.7).
- I have drawn heavily on the relativity textbook Schutz , which is introductory but known for its precision and clear explanation. The author “stresses the geometrical nature of tensors rather than the transformation properties of their components” (§3.9, see also sources recommended there). Many other sources would contain similar material. Note Schutz §3 concerns special relativity only, but just replace with , the Lorentz boost by an arbitrary coordinate transformation, etc.
- Collier (2012), A Most Incomprehensible Thing: Notes toward a very gentle introduction to the mathematics of relativity has a lengthy pedagogical section in §5.3
- This topic applies to geometry generally, not just relativity in particular. So try also books on differential geometry, or field theory (such as electromagnetism) for instance, and let me know of any helpful ones.