The camera model used here should be adequate for producing accurate geometric data, except in the following three situations. First, a fish-eye lens has a very large distortion for which the distortion polynomial used here would not converge. (In fact, the distortion as defined here can be infinite, since the field of view can exceed 180°.) For such a lens the image coordinate should be represented as being ideally proportional to the off-axis angle, instead of the tangent of this angle as in the perspective projection. Then, a small distortion could be added on top of this. Furthermore, the position of the entrance pupil of a fish-eye lens varies greatly with the off-axis angle to the object; therefore, this variation would have to be modeled unless all viewed objects are very far away. Second, even for an ordinary lens, the entrance pupil moves slightly, so that if objects very close to the lens are viewed, the variation again would have to be included in the camera model. Third, if there is appreciable nonradial distortion, such as might be produced by distortion in the image sensor itself or by a lens with badly misaligned elements, it would require a more elaborate model than the radial distortion used here. However, this situation should not occur with a CCD camera with a well-made lens, unless an anamorphic lens is used. (A small amount of decentering in the lens and a CCD synchronization error that is a linear function of the position in the image are subsumed by terms included in the camera model.)
The calibration method presented here applies at only one lens setting. For a zoom lens, a separate calibration would have to be done at each zoom setting. Similarly, if focus is changed, a separate calibration is needed at each focus setting.
In this paper, for physical vectors in three-dimensional space, the dot product will be indicated by · and the cross product by ×. For any such vector v, its length () will be represented by |v|, and the unit vector in its direction (v/|v|) will be represented by unit(v). The derivative of a vector relative to a scalar will be considered to be a vector, the derivative of a scalar relative to a vector will be considered to be a row vector, and the derivative of a vector relative to a vector will be considered to be a matrix.
_{(1)} | |
_{(2)} |
The above equations are given here merely to show the mathematical relationship represented by the camera model excluding distortion. In the equations used below for actual data, the subscript I will be used on quantities associated with individual measured points.
The position c used above often is referred to as the perspective center. Sometimes, this is assumed to be the first nodal point of the lens (the same as the first principal point if the medium is the same on both sides of the lens). However, the rays from the object (extended as straight lines) must pass through the entrance pupil if they are to reach the image, so the center of the entrance pupil is the position from which the camera seems to view the world. Similarly, the rays to the image seem to emanate from the exit pupil. If all viewed objects are at the same distance and the camera is perfectly focused, there is no detectable difference between using nodal points instead of pupil points in the above definitions, but in general the distinction should be made; and, if all calibration points are at the same distance, the calibration cannot determine c anyway. Of course, mathematically the camera model is defined by the equations, so the physical meaning of the parameters does not matter for calibration purposes, as long as they are able to capture the degrees of freedom in the physical situation, and as long as the same equations are used in the camera model adjustment and in the use of the camera model. (For definitions of the terms used here, see any optics textbook, for example Jenkins and White [JENKINS&WHITE1950]).
The distortion polynomial must be defined relative to the optical axis of the lens, in order for the distortion to be radial. If the image sensor plane is perpendicular to the optical axis, this is the a vector. However, to allow for the possibility that they are not exactly perpendicular, a separate unit vector o is used here for the optical axis. Note that if there is no distortion and the pupil points coincide with the nodal points, it is impossible for the calibration to determine the optical axis. (A central projection is equivalent to using a pinhole camera, and a pinhole has no axis.) Therefore, some a pnriori weight will be applied here to tend to make the o vector equal to the a vector, so that it will be well determined when there is not much distortion. However, when there is a large distortion, the data will outweigh this a priori information, and the two vectors can differ.
Let p_{i} be the three-dimensional position of any point being viewed. Then the component parallel to the optical axis of the distance to the point is
(3) |
The vector to the point orthogonally from the optical axis is
(4) |
The square of the tangent of the angle from the optical axis to the point (subtended at the entrance pupil point) is
(5) |
We can define a proportionality coefficient µ_{i}, that produces the amount of distortion when multiplied by the off-axis distance to the point, as a polynomial containing only even-order terms (since we have factored out the first power). Since _{i} is proportional to the square of this off-axis distance, this polynomial can be written as follows:
(6) |
where the 's are the distortion coefficients. We use only coefficients up to _{2}, since higher-order terms are negligible except for wide-angle lenses. (In fact, often _{2} is negligible.) Some a priori weight (usually a small amount) will be applied to tend to make the 's equal to zero, so that they will be well determined when there is not enough information in the calibration images. (This is especially important for high-order coefficients when there are not many calibration points, and for _{0} if o and a are nearly equal or if the pupil points nearly coincide with the nodal points.) Note that _{0} does not actually represent radial distortion in the usual sense, but is merely a scale factor in the plane orthogonal to o, whereas the scale factors included in h and v are in a plane orthogonal to a.
The effect of distortion can now be written as follows:
(7) |
where p_{i} is the true position of the point and p'_{i} is its apparent position because of distortion.
Then the distorted point can be projected into image coordinates (denoted by _{i} and _{i}) by using p'_{i} instead of p in (1, 2). However, for efficiency these equations are expressed in terms of the intermediate quantities _{i}, _{i}, and _{i}, as follows, since these quantities will be needed again in later sections:
(8) | |
(9) | |
(10) | |
(11) | |
(12) |
(The reason for using the circumflex over x and y is to represent computed values, to distinguish them from measured values that will be used in Section 4.)
Therefore, the complete camera model consists of the five vectors c, a, h, v, and o, where a and o are unit vectors, and the distortion coefficients _{0}, _{1}, and _{2}. This is 18 parameters in all, of which two are redundant because of the unit vectors. (The computations can be extended in a straightforward way to include higher-order 's, if they are ever needed.)
(13) | |
(14) | |
(15) | |
(16) | |
(17) | |
(18) | |
(19) | |
(20) | |
(21) |
where is the 3-by-3 identity matrix. The first three of these (derivatives of _{i} and µ_{i}) are only intermediate quantities needed in obtaining the others and will not be used elsewhere.
In order to obtain a solution, the input points must not be all coplanar, and they must be distributed over the image space. Since there are 16 independent parameters in the camera model and each point contains two dimensions of information in the image plane, at least 8 points are needed, or 6 if the a priori standard deviations are small enough to add appreciable information. However, it is highly desirable to have considerably more points than this minimum in order to obtain an accurate solution for all of the parameters.
c_{o} is given | (22) |
a_{o} = unit(p_{a} - c_{o}) | (23) |
(24) | |
(25) | |
o_{o} = a_{o} | (26) |
where u is a vector pointing upwards in object space, and where it is assumed that the image x axis points to the right and the image y axis points down (from the upper left corner of the image). (If the y axis points up in the image, the sign of v_{0} should be reversed.) The initial values for _{0}, _{1}, and _{2} are zero.
The a priori weight matrix is computed as follows:
(27) |
where the fact that the off-diagonal terms for a and o are negative of the main-diagonal terms causes the standard deviation _{d} to apply to the difference of a and o. Additional weight can be applied to any other of the initial approximation values, by adding the reciprocal of the variance to the appropriate main diagonal element of N_{o}). For example, the position of the camera may be known sufficiently well to enter this information into the solution by adding appropriate weight in the first three diagonal elements of N_{o}.
The program has an inner loop for iterating the nonlinear solution and an outer loop for editing (removing erroneous points) by a previously developed general method [GENNERY1980, GENNERY1986]. The steps in these computations for the problem here are as follows.
(28) |
For each point i currently retained, compute _{i}, _{i}, and the preliminary quantities by the equations in Section 2.2; and compute the partial derivatives of p'_{i}, _{i}, and _{i} as in Section 3. Then the 2-by-18 matrix of partial derivatives of _{i} and _{i} relative to g is
(29) |
(since _{i}/c = - _{i}/p_{i}, and similarly for _{i}), and the discrepancies between measured and computed data are
(30) |
Compute the matrix of coefficients in the normal equations N (18-by-18), the "constants" in the normal equations t (18-by-1), and the sum of the squares of the discrepancies (which become the residuals when the solution has converged) q as follows:
(31) | |
(32) | |
(33) |
where the summations are over all points currently retained, n is the total number of these points, g represents the current parameter values, and g_{o} represents the a priori parameter values (initial approximations). The first term in (31) and (32) applies the a priori weight to the initial values, and the second term applies some weight to the constraints. These constraint terms mathematically have no effect on the solution, since the exact constraints are applied below in steps 3 and 5. However, they are necessary to prevent the solution without the constraints from being singular, and that will be computed first (in step 3, as (N^{-1}t) before the constraint equation is applied. The scale factor chosen for these terms above cause them to have about the same magnitude as the result of the main summation for N, so that numerical accuracy is preserved.
(34) | |
(35) |
Note that a and o are unit vectors on the first iteration (because of the initial approximations) and on the last iteration (within the convergence tolerance), but on intermediate iterations they in general are not. The comparison of their magnitudes with unity in the above equation is what causes them to converge to unit vectors. Then add d to the current values of the camera model parameters to produce the new values, as follows:
(36) |
The estimate of measurement variance is
(37) |
where n is the number of points currently retained (not rejected), that is, the number of points actually used to obtain q. The denominator in (37) would be 2n - 16 if there were no a priori weights. The value 2n - 14 is an approximation, based on the fact that there usually is a large a priori weight forcing o to be nearly equal to a, but not much weight forcing _{0}, _{1}, and _{2} to be nearly zero. The exact value is not very important, since the number of measurements 2n usually is much larger than 14, and usually not much accuracy is needed or is attainable with variances, anyway.
C_{gg} = N^{-1} - N^{-1}K^{T}M^{-1}KN^{-1} | (38) |
If this is the first time here, go to step 8.
(39) |
where is the 2-by-2 identity matrix. If r_{i} > 16, reject this point. Otherwise, go to step 9 (exit from the edit loop).
(40) |
Tentatively reject the point with the greatest r_{i}. Then go to step 1. (This is the end of the edit loop.)
A possible improvement that may be desirable in the case of grossly nonsquare pixels would be to compute separate q's for each image dimension by summing the squares of the discrepancies separately for x and y instead of using (33), and by dividing by n - 7 instead of 2n - 14 in (37) to obtain the two variances. Then a 2-by-2 weight matrix with the reciprocal of these variances on the main diagonal would be included in the summations in (31) and (32) in the usual way [MIKHAIL1976], instead of factoring the variance out as above.
Often the partial derivatives of the above projection are needed (for error propagation or in a least-squares adjustment). These are obtained as follows:
(41) | |
(42) |
in terms of the partial derivatives defined in Section 3. However, for most applications not much accuracy is needed in partial derivatives. Therefore, when the distortion is small (|_{0}| << 1, |_{1}| << 1, and |_{2}| << 1) and speed is important, sufficient accuracy may be obtained by assuming that p'_{i}/p_{i} is the identity matrix, so that the following results:
(43) | |
(44) |
Using these approximations can result in significant savings in time if many points are to be projected, since the computation of p'_{i}/p_{i} is considerably more involved than is the computation of _{i}/p'_{i} and _{i}/p'_{i}, as can be seen in Section 3.
r' _{i} =
sign(a · v ×
h) unit[(v -
y_{i}a) ×
(h - x_{i}a)] |
(45) |
where "sign" = ± 1 according to whether its argument is positive or negative. Then the distortion can be added by one of two methods.
In the first method, (3-7) are used with p_{i} - c replaced by r_{i} and p'_{i} - c replaced by r'_{i} (which is equivalent to considering a point at unit distance), to produce the following:
(46) | |
(47) | |
(48) | |
(49) | |
(50) |
where we have had to rescale r_{i} to force it to be a unit vector, since otherwise the distortion correction changes only the component of the vector perpendicular to the optical axis, and thus changes the length of the vector. However, the equations in Section 2.2 were designed to go from the unprimed to the primed quantities, and here the opposite is desired. Therefore, in this method an iterative solution is done as follows: on the first iteration, r_{i} is set equal to r'_{i}; then an improved r_{i} is computed as above on each iteration. The values of the other quantities from the last iteration also are needed below if partial derivatives are computed.
The second method is faster and is one that has been implemented. It can be derived by manipulating (46-50) to produce the following:
(51) | |
(52) | |
(53) | |
(54) |
Equation (54) can be solved for µ'_{i} by Newton's method (using µ'_{i} = 0 as the initial approximation). Then,
(55) |
However, if partial derivatives are to be computed below, corrected values of _{i}, _{i}, _{i}, and µ_{i} must be computed as in the first method, by using (46-49(. (Now only one iteration is needed, since r_{i} has already been obtained.)
The partial derivatives of the projection into a ray can be obtained by differentiating (45) with respect to x_{i} and y_{i}, to obtain the following:
(56) | |
(57) |
Then the effect of distortion can be included by computing the partial derivatives as in Section 3, with p_{i} - c replaced by r_{i} and with a rescaling because of the change in magnitude due to (50), to produce the following:
(58) | |
(59) | |
(60) |
Then, since the transformation in the other direction is desired, the inverse of the matrix is used, as follows:
(61) | |
(62) |
As in projecting in the other direction, this correction of the partial derivatives for distortion can be omitted when speed is important and not much accuracy is needed, if the distortion is small.