What is the difference between “contravariant” and “covariant” tensors, and why do they transform differently under a change of coordinates? When first learning this material I could apply the formulae but was very confused by what the concepts intuitively meant, and could not discern the difference between the esoteric paths termed “covariant” and “contravariant”.
The purpose of this article is pedagogical, and students often find the concept “tensor” itself confusing and intimidating, so we clarify it first. I assume the reader already has basic familiarity with tensors, so examples of what is and is not a tensor, to aid conceptual clarity.
- A tensor is a geometric object. This is a more physical requirement/interpretation.
- For example the temperature at each location in a room is a scalar (field), a specific case of tensor. The temperature at a point is independent of the coordinates used.
- A non-example is taking the value of the 1st coordinate say, at each location, in arbitrary coordinates. This gives a number, but not a scalar. It has nothing to do with physical reality. (See Norton, “Did Einstein Stumble?” §4.2 for vaguely-related historical discussion.)
- Vectors are tensors. They are geometric objects, and transform as expected. As Zee (2013, §1.4) said, tongue-in-cheek, “A tensor is something that transforms as a tensor.”
- Christoffel symbols are not tensors. These do have a transformation law under a change of coordinates, but for one thing the transformation is not linear:
- Tensor components transform linearly under a change of coordinates.
- Tensors are linear. The mathematical definition is: a linear functional from vectors (in the tangent space at a point) to reals: , or on the dual space: , or multiple copies of these.
- Dual vectors are tensors. A familiar example of a dual vector is a gradient. Imagine the temperature distribution mentioned above (Collier §22.214.171.124). Given its (spatial) gradient, supply a vector and it tells you the total temperature change in the direction of that vector. In other words, gradient sends vectors to real numbers. But that’s what a dual vector does!
- Pseudotensors are not tensors. Like all our nonexamples, they (inescapably) rely on the choice of coordinate system. Some are tensors up to an orientation determined by coordinates. More generally, physical quantities have been derived from them, e.g. Einstein used a pseudotensor for the energy of the gravitational field to make successful predictions about gravitational waves, however this was seen as a bit dubious.
Incidentally the terminology “contravariant”‘, “covariant”, and “mixed” for tensors is from Ricci. The term “tensor” is from Einstein and Grossmann (Rosenfeld 1988, §8).
Briefly, a “contravariant” tensor transforms oppositely to a “covariant” tensor, meaning their components transform inversely under a change of basis. A “contravariant” tensor has components written with raised indices, for example a 4-velocity , whereas the components of a “covariant” tensor are written with lowered indices, for example a metric . There are also tensors with “mixed” indices, for example the Riemann tensor is often given as . We will get to more precise definitions soon. But for now, note this “contravariant” and “covariant” terminology is old-fashioned (Schutz §3.3 p.60). It is better to refer to “raised” and “lowered” indices (or “upstairs” and “downstairs” as in Misner, Thorne & Wheeler 1973, §3.2) or more generally “a tensor of type (1,3)” in the example of the Riemann tensor above. From now on, I assume a single index only. In this case, a tensor with single raised index is termed a vector, and lowered index a dual vector, 1-form, or possibly covector.
Suppose we wish to transform from coordinates to . Recall that if a vector ‘a’ has components in the original coordinates, then its components in the new coordinates are . On the other hand, suppose a dual vector ‘b’ has components in the original coordinates, then its transformed components are . (The Einstein summation convention is assumed throughout.) From the first formula, the vector components undergo a linear transformation at each point, given by the matrix of partial derivatives . Conversely, dual vector components are transformed by (I have relabeled the indices). Recall these are called Jacobian matrices and are inverses of one another, hence we say the components transform inversely. (I should add an example.)
We can raise and lower indices by and . This was the source of my confusion, because it seemed that one object could transform in two different ways, depending on whether it was written or . It seemed “covariant” and “contravariant” transformations were very different concepts. But the resolution is simply that and correspond to different objects, a dual vector and vector respectively. Yes, confusingly, both a vector and its dual would usually be written as ‘a’ in index-free notation. This is because there is a natural correspondence between them, given a metric (see the raising and lowering formulae above). The notation is convenient, but you just have to avoid the misconception. In index-free notation, Schutz (§3.3) chooses a notation for a dual vector, and for a vector, however this is not a standard convention, and in relativity the “→” might imply a 3-dimensional spatial vector specifically. One option is to simply write in words “...the dual vector a...”.
Given that a tensor is a vector or dual vector, the transformation law is determined. Components of vectors transform inversely to components of dual vectors. (Also, just to add more confusion, components of vectors transform inversely to basis vectors. And components of dual vectors transform inversely to dual basis vectors.)
This inverse transformation gives rise to the word ‘dual’ in ‘dual vector space’. The property of transforming with basis vectors gives rise to the co in ‘covariant vector’ and its shorter form ‘covector’. Since components of ordinary vectors transform oppositely to basis vectors… they are often called ‘contravariant’ vectors. Most of these names are old-fashioned; ‘vectors’ and ‘dual vectors’ or ‘one-forms’ are the modern names. The reason that ‘co’ and ‘contra’ have been abandoned is that they mix up two very different things: the transformation of a basis is the expression of new vectors in terms of old ones; the transformation of components is the expression of the same object in terms of the new basis. (Schutz §3.3)
Some mathematical definitions are overdue [although these really belong elsewhere, and are only an abbreviated summary, not an introduction]. Recall that in curved geometry a vector exists only in the tangent space at a given point, and does not stretch from one point of the manifold to another as often conceived in flat space. So a vector is an element of the tangent space. A dual vector maps vectors to the real numbers , in other words, it is a linear functional on the tangent space. Alternatively, we can think of a vector as mapping dual vectors to (see Schutz §3.6 on “Circular reasoning?”). Recall that for a given coordinate system , the coordinate basis vectors are denoted , and the dual coordinate basis vectors , with the property .
One final clarification, the modern consensus seems to be that the term “covariant” has the broader meaning of any quantity which transforms as a tensor, meaning it transforms appropriately with a coordinate change. Hence we would say both vectors and dual vectors, as well as all other tensors, are covariant! Like anything, the meaning should be clear from the context.
- In place of “contravariant”, say raised indices, a tensor of type , or a vector (in the case of a one-dimensional tensor). In place of “covariant”, say lowered indices, a tensor of type , or a dual vector / covector / 1-form.
- The raising or lowering of indices, for instance , corresponds to different tensors. They are usually written with the same variable (here ‘a’) because the metric determines a natural correspondence between them.
- There is only one type of transformation, given a specific type of tensor. (Yes, maybe one could count a transformation and its inverse as two types… but I express it this way to combat my earlier misconception, see above).
- Use “covariant” to express that a quantity is a tensor, meaning it transforms appropriately under a change of coordinates. This is independent of whether its indices are raised, lowered, or mixed.
- Usage is different, particularly in older sources, but the meaning should be clear from the context.
Update: From a further survey of textbooks, it is evident the terminology Schutz criticises is very widespread. For instance, a leading mathematical reference on differential geometry (O’Neill 1983, p.37) does use this language, though more recent books are worth checking.
Addendum on raising and lowering indices: when the tensors are not symmetric you need to be careful of index positions. For example given a tensor , lowering either the first or second indices gives different (1,2)-tensors and (Schutz §3.7).
- I have drawn heavily on the relativity textbook Schutz , which is introductory but known for its precision and clear explanation. The author “stresses the geometrical nature of tensors rather than the transformation properties of their components” (§3.9, see also sources recommended there). Many other sources would contain similar material. Note Schutz §3 concerns special relativity only, but just replace with , the Lorentz boost by an arbitrary coordinate transformation, etc.
- Collier (2012), A Most Incomprehensible Thing: Notes toward a very gentle introduction to the mathematics of relativity has a lengthy pedagogical section in §5.3
- This topic applies to geometry generally, not just relativity in particular. So try also books on differential geometry, or field theory (such as electromagnetism) for instance, and let me know of any helpful ones.
A spacetime is static or stationary if it does not change over time, loosely speaking. “Static” is a stronger condition than “stationary”, and means a reversal of time does not change the spacetime. As Carroll (§5.2) explains,
You should think of stationary as meaning “doing exactly the same thing at every time,” while static means “not doing anything at all.”
Technically, a spacetime is stationary if it possesses a timelike Killing vector field at infinity (or in a given region / subset, see later). A spacetime is static if in addition the timelike Killing vector field is orthogonal to a family of hypersurfaces.
- Minkowski space is static, and hence also stationary
- Schwarzschild spacetime is static (but in the black hole case, this only holds for outside the event horizon: . The interior is not even stationary)
- A Kerr black hole is stationary (only outside the ergosphere: ) but not static because it is rotating
- Friedmann-Lemaître-Robertson-Walker (FLRW) universe models are in general neither static nor stationary, roughly speaking because the scale factor is time-dependent. However special cases such as de Sitter space and Einstein’s original “static” universe are static
We can normalise the Killing vector field at each point to obtain “stationary” or “static” observers, which are considered to be at rest. If the timelike Killing vector field is , then has norm -1, and is a normalised timelike 4-velocity.
For a stationary spacetime, we can choose a “time” coordinate , such that the Killing vector field is and the metric components are independent of . For a static spacetime, we can additionally impose no time-space cross-terms where . We could then call coordinates with these properties stationary and static coordinates respectively when they make manifest these underlying properties of the spacetime. (Examples of this terminology: Francis & Kosowsky 2004, Kraus & Wilczek 1994).
- Schwarzschild spacetime (for ): Schwarzschild coordinates are static. Gullstrand-Painlevé coordinates are only stationary, because of the cross-term
- de Sitter space: de Sitter’s original “static” coordinates are static within any given Hubble sphere. FLRW coordinates are not even stationary, because the scale factor appears in the metric
- Minkowski space: the usual inertial/Cartesian coordinates are static. Even Rindler coordinates, which correspond to accelerating objects, are static (for , considering only one quadrant of the Rindler wedge)! This is apparently because Minkowski space is highly symmetric.
- Static ⇒ stationary
- Spherically symmetric + stationary ⇒ static
The timelike Killing vector field gives a natural splitting of spacetime into space and time.
Naturally we expect the choice of Killing vector field to be unique. The usual definition (c.f. Carroll) of timelike “at infinity” ensures this, in the Schwarzschild and Kerr spacetimes at least. However this definition needs to be broadened, because it excludes de Sitter space for instance. I extend the definition to “timelike in a given region / subset”. An observer “at infinity” is considered preferred or objective, but we can generalise this to a “fiducial” observer at some fiducial location. We expect them to have many of the same properties, such as being static/stationary, freely falling (geodesic), and possibly local inertial coordinates at that location. They are also considered to be free from gravitational effects. For example, in de Sitter space there is no preferred choice in general, however in my forthcoming “galaxy cable” paper, there is a natural choice given a choice of origin. A set of observers are given, and there is a unique field parallel to them.
Carroll (§5.2, p.203 onwards) gives a helpful overview
Boyer-Lindquist coordinates are a description of a rotating (“Kerr”) black hole. They are a generalisation of Schwarzschild coordinates. They are the simplest coordinates for calculations, generally speaking, because the line element has just one cross-term in these coordinates. They were derived by Robert Boyer and Richard Lindquist, and published in a 1967 paper.
Metric and line element
With coordinates (t,r,θ,φ) the metric line element is:
where (also written Σ by e.g. Frolov & Novikov), and (called a “discriminant” by Carter §4.3 in Wiltshire et al) are standard notation. Every source I have checked uses an equivalent expression to this one, apart from the original paper by Boyer & Lindquist (§2) which has a different sign for the coefficient of dt dφ, presumably due to Kerr’s original a, the sign of which was quickly changed by the community. The above metric is used by O’Neill §2.1, Visser §1.5, Teukolsky §2, and others listed shortly. With some algebra, the coefficient of may be rearranged to , so the version given by Frolov & Novikov §3.2.1 is equivalent.
Carter (§4.3 in Wiltshire et al) gives the equivalent expression:
O’Neill (§2.6) has, in addition to our first-listed version:
Chandrasekhar (§54) gives the following expression, which is changed to our metric signature convention:
which also turns out to be equivalent. He uses (§53) but a different definition of : (§54; also Frolov & Novikov label this A), for which we gave an identity above. [Check, earlier text reads: which we can show is also equal to ].
To write the components of the symmetric metric tensor, don’t forget to halve the off-diagonal terms, and so Chandrasekhar §54 is also equivalent:
The components of the inverse metric are found by inverting the above matrix. Notice it is in a “block diagonal” form (at least, if we were to rearrange the order of coordinates), so we merely have to invert each 2×2 block. This is trivial for the r-θ block; for the t-φ block the determinant is… and so
This is given by Frolov & Novikov §D.1.
From Kerr coordinates:
, (Boyer & Lindquist, §2)
Compare Frolov & Novikov §D.7
[There are two types of Kerr coordinates, based on ingoing/outgoing principal null directions. (O’Neill §2.5) O’Neill gives ingoing version]
Visser §1.5 gives ,
Transformation with Kerr coordinates (“K”):
(same in Carter §4.3)
(Teukolsky §2, but opposite sign of in Carter §4.3)
Also given in Kerr \S2.6, but different.
These are especially useful for timelike geodesics.
Coordinate singularity at Δ=0.
When a→0, Schwarzschild coordinates result.
Canonical vector fields , . Then identities , , , , , , (O’Neill §2.1).
More properties of B-L coordinates: , (in Frolov for instance).
Principal null directions. These are null geodesics, and can be thought of as photons directly approaching/receding from the black hole, at least at large r (Floyd p.52 says B-L adapted to the “outgoing” n-congruence). In these coordinates: (O’Neill §2.5)
There is a “price to be paid for the algebraic simplicity that has made it the most widely known expression for the Kerr solution” — singular where Δ=0 (Carter §4.3 in Wiltshire et al).
Singular. For null geodesic at least, “as such a curve approaches a horizon, so Δ→0, it exhibits the infinite spiraling and slowing that signals the failure of Boyer-Linquist coordinates.” (O’Neill §2.5)
Null tetrad (Frolov & Novikov §D.6).
Christoffel symbols and curvature tensors
These are given, for Boyer-Lindquist coordinates, in Frolov & Novikov §D.2, Mueller §2.14.1, and possibly O’Neill §2.
Frolov & Novikov (§D.3, from setting the charge Q=0 for the Kerr geometry) give the rank-2 Killing tensor corresponding to Carter’s constant as:
Then , Carter’s constant, which is conserved along a geodesic. (Note: I use [quantity] per unit mass of the particle. Many authors use momentum instead of velocity in the above equation). This tensor is also the “square” of a Killing-Yano tensor f which is antisymmetric: :
[This is formally identical to the Kerr coordinates case, apart from a couple of minus signs; also recall that φ‘s are defined differently].
Jezierski & Łukasik (2006, §4) give the Killing-Yano tensor in the form:
which is equivalent. Carter (§4.5 in Wiltshire et al) gives an expression in terms of a canonical tetrad:
which also turns out to be equivalent. We can also define a “total angular momentum vector per unit mass” (compare Carter) by . This vector propagates parallely, and its square is the Carter constant: (check u / p mixing…) [Care: I am using … per unit mass, whereas many authors don’t divide by the mass]
Kerr published the discovery of the rotating black hole solution in 1963. Later, “In Papapetrou (1966) [http://adsabs.harvard.edu/abs/1966AnIHP…4…83P] there is a very elegant treatment of stationary axisymmetric Einstein spaces. He shows that if there is a real non-singular axis of rotation then the coordinates can be chosen so that there is only one off-diagonal component of the metric. We call such a metric quasi-diagonalizeable.” (Kerr §2.6 in Wiltshire et al 2009).
Boyer and Lindquist published their coordinates in 1967. Sadly, Boyer was killed 2 weeks after the editor received the submitted paper. Kerr claims that he and Ray Sachs also discovered this solution, but did not consider it: “Having derived this canonical form, we studied the metric for at least ten minutes and then decided that we had no idea how to introduce a reasonable source into a metric of this form, and probably would never have.” (ibid.) But they did not publish it, so Kerr appropriately credits Boyer & Lindquist.
The sign convention of Kerr’s original rotation parameter a was quickly changed by the community. Kerr credits this (in §2.5 of Wiltshire et al) to Boyer (see e.g. the comparison with Lense-Thirring precession in Boyer & Lindquist §2). Boyer & Lindquist termed their coordinates “S” coordinates, because they generalise Schwarzschild-Droste coordinates for a non-rotating black hole.
Kerr coordinates describe a rotating black hole. They were published by Roy Kerr in 1963, and represent the first solution of this spacetime. (See also Kerr Cartesian coordinates, given in the same paper). They are a generalisation of Eddington-Finkelstein coordinates for a Schwarzschild-Droste (non-spinning) black hole.
(Teukolsky, Kerr tweaked) where .
I have compared numerous sources to ensure this is the standard version, to avoid the confusion and frustration of a minus sign difference between authors, for example.
(These are equivalent to Kerr’s original coordinates but with two minor tweaks of notation. Kerr originally used the opposite sign of a, but this was due to a calculation error, and was corrected by Boyer who compared the angular momentum with the Lense-Thirring results, and this became standard in the subsequent literature (Kerr §2.5 in Wiltshire et al. B-L 1967 has similar sounding, is this what Kerr is quoting?). Also we use v because this is the convention for advanced time; Kerr originally used u but this commonly represents retarded time (Teukolsky 2015, footnote 1). Visser §1.1 in Wiltshire et al uses Kerr’s original a, and this confused me for a long time because of inconsistency with other sources…) Advanced Eddington-Finkelstein form.
B-L give: (§2) They call it “(E) frame” since it generalises Eddington-Finkelstein coordinates. Inverse metric in eqn 2.15. Null congruence k^\mu=(-1,0,0,1) in these coords.
B-L “E’ frame” adapted to l vector. . ,
u replaced by -u for current convention, and a replaced with -a
(Kerr \S2.5 in Wiltshire et al)
Chandrasekhar electronic p328 – compare metric form!
Ingoing principal null vector -\partial_r
(Teukolsky 2015, footnote 1)
Carter  integrals of motion
[probably that was a citation in Floyd. And indeed, 3 is: B. Carter; The Physical Review, Vol 174 No.5, p 1559, 1968.]
Kerr \S2.6 in Wiltshire et al: clarifies what he meant in 1963 paper by desiring “interior solution”
Timelike geodesics. See Boyer-Lindquist coordinates, but and are replaced by and :
Jezierski & Łukasik (2006, §4) gives the Killing tensor K corresponding to Carter’s constant in raised index form as
(check 1/2 coefficients of cross terms…)
They also give its “square root”, a Killing-Yano tensor:
which has components
(check: do I need to halve everything, or double it? Same for corresponding expression in Boyer-Lindquist coordinates)
Geometric units are units of measurement in which Newton’s gravitational constant and the speed of light are 1: G=c=1. Quantities are hence expressed in units of length, which we will take to be metres. This may seem a pain at first, but it is very convenient when working with equations in relativity because you can omit any G and c terms which makes expressions shorter. Given any expression, it is straightforward to return it to ordinary SI (Système international) units which use metres, kilograms, seconds etc. Related are Planck units in which additionally the reduced Planck constant .
|Quantity||SI units||⇒ Conversion factor ⇒||Geometric units|
|Angular momentum||kg m2/s||G/c3||m2|
|Force||N (kg m/s2)||G/c4||1|
|Energy||J (kg m2/s2)||G/c4||m|
|Energy density||J/m3 (kg/m/s2)||G/c4||m-2|
“N” means newtons, “J” means joules, and units of “1” means dimensionless. To convert from SI units to geometric, multiply by the listed conversion factor. To convert from geometric units to SI, divide by it.
- 2km. In geometrised units, unchanged, so 2km.
- mass of Sun is ≈ 2×1030 kg. Multiply by G/c^2 to get ≈ 1500m, which can be conveniently done in Wolfram Alpha. So the mass of the Sun is 1.5km!
- Suppose we want to go the other way, starting from geometric units and wanting to express in “normal” units. Time of 1.3×1026 metres. So we multiply by the inverse of the conversion factor, so that’s 1/c. Answer 4.4×1017 seconds or 14 billion years, which is the age of the universe as crudely estimated from the Hubble constant.
A Schwarzschild black hole is the simplest type of black hole: it does not rotate and has no electric charge. It is named after Karl Schwarzschild, discovered in * and published in 1916.
One choice of coordinates, and probably the most common one, is Schwarzschild-Droste coordinates (t,r,θ,φ), under which the metric takes form
in geometric units G=c=1. (Droste is not usually credited, but deserves to be. See *)
This was the first non-trivial exact solution found to Einstein’s field equations.
Schwarzschild spacetime does not change over time, and is spherically symmetric. Mathematically, these symmetries are described by the following Killing vectors:
Christoffel symbols, and curvature tensors. Some sources giving curvature quantities in various coordinates are: Hartle §B for Schwarzschild coordinates, Frolov
Orbits: velocities and frames
Static observer. .
Geodesic motion. Worldlines parametrised [well, mostly…] by invariants e, the “energy per unit mass”, and , the “angular momentum per unit mass”.
Radial motion: Taylor & Wheeler term “rain”, “hail”, “drips”. I add a 4th metaphor, “snow”, for e≤0 which is only allowed inside the event horizon r=2M. These have zero angular momentum (). 4-velocity .
More generally, $u^\mu=
Tetrad: Frolov §2.11.2 citing Luminet and Marck (1985) http://adsabs.harvard.edu/abs/1985MNRAS.212…57L