Cameras for 3D imaging

John Gilmore, Hamamatsu Corporation
Slawomir Piatek, PhD, Hamamatsu Corporation & New Jersey Institute of Technology
November 2, 2014

Gauging the distance, size, and shape of an object is of paramount and self-evident practical importance in everyday life. Nature has evolved a variety of ways for organisms to obtain 3D information: stereoscopic visions utilizing two or more eyes, and sonar ranging are two examples. Extending this ability to inanimate systems such as robots, “decision makers” in automated assembly lines, or self-driving vehicles has been and continues to be an active area of research and development.

Among the several techniques that use light for measuring depth, two bear relevance to the current article. In the first, the distance or range R to a point on the target surface derives from the time ΔT it takes a pulse of light (for example, emitted by a laser) to travel from the observer to the point and back, namely $R = \frac{c Δ T}{2 n}$ $R = \frac{c Δ T}{2 n}$ , where c is the speed of light in vacuum and n is the index of refraction of the surrounding medium. In the second technique, R derives from the phase difference ΔΦ between the emitted intensity-modulated beam of light and its reflection; here, $R = \frac{c Δ Φ}{4 π n f}$ $R = \frac{c Δ Φ}{4 π n f}$ , where f is the frequency of modulation. Prior to the mid-1990s, a 3D camera employing either of these two techniques required a mechanical scanning system to sample an array of points on the target surface. One limitation of such an arrangement is a compromise between the frame rate and density of sampled points, where the former affects the temporal accuracy of depth measurement for a moving target, whereas the latter affects the spatial resolution of the features on the target. This compromise could have been lessened if it were possible to measure simultaneously all of the distances to an array of points on the target surface. This is now possible.

The breakthrough is the development of a CMOS-architecture imaging array where each pixel is a Photonic Mixer Device (PMD). The left panel in Figure 1 depicts a simplified structure of a PMD. An internal electric field directs the photogenerated charge carrier (electron) to one of the two charge storage locations. Two external signals Vtx1 and Vtx2 control the strength and direction of the electric field and, therefore, they also control how much charge each storage location receives in response to incident light. The output signals V₁ and V₂ are a measure of how much charge has been accumulated in locations 1 and 2, respectively. The right panel of Figure 1 shows a simplified electrical equivalent circuit of a PMD. The main components are a photodiode (generator of photo-charge), two capacitors (charge storage locations), and switches (responsible for directing the photo-charge to the appropriate capacitors). A discussion of a more complete equivalent circuit and how it operates is below.

Figure 1. A simplified structure of a PMD (left panel) and its equivalent electrical circuit (right panel).

In one common arrangement, a 3D camera system illuminates the scene with an intensity-modulated infrared light. The optics of the system creates an image of the scene on the array of PMD pixels. For each pixel, the system determines an autocorrelation function between the electrical signal that modulates the emitted light and the electrical signals coming from the two capacitors. Sampling the resulting function four times per period gives the phase shift, strength of the returned signal, and the background level using well-known mathematical relations. The distance is proportional to the phase shift.

In the second arrangement, a 3D camera system illuminates the scene with a pulse of infrared light of duration T0 (from nanoseconds to microseconds) while simultaneously making the pixels sensitive to light for the duration of 2T0. During the first half of 2T0, only one of the two capacitors collects the charge, whereas during the second half, only the second capacitor does. The distance imaged by a pixel derives from the relative amounts of charge collected by each of the two capacitors. A single pulse of light generally produces too little signal in the capacitors; thus, the system illuminates the scene with thousands of pulses appropriately spaced in time so that the capacitors accumulate enough charge to yield accurate distance. In literature, this type of a 3D camera is referred to as an indirect time-of-flight (I-TOF) camera, and the remainder of this article describes its operation in greater detail.

Principles of operation

Figure 2 explains the principles of operation of a “single-pixel” I-TOF camera, assuming one pulse of light per frame and the absence of background light and dark current. The shaded region depicts an equivalent electrical circuit of the pixel.

Figure 2. Principles of operation of a single pixel in an I-TOF camera.

The pixel consists of a photodiode (PD) whose output connects to three MOSFET switches S₁, S₂, and S₃. The first two connect to the charge integrators C₁ and C₂, respectively, whereas the third connects to an external voltage source V_dd. The timing circuit generates CMOS-compatible logic signals Vtx1, Vtx2, and Vtx3, which drive the switches. A signal that is “high” turns a switch ON, whereas a signal that is “low” turns a switch OFF. A dual switch S₄ shunts C₁ and C₂; the signal V_R, also produced by the timing circuit, controls its operation. The pixel outputs two voltages V₁ and V₂ per frame from which the distance to an element of the target imaged onto the pixel can be calculated. For an array of (m, n) pixels, the camera determines m x n independent distances to the target elements, one for each pixel, per frame.

To measure the distance to a point on the target, the timing circuit produces a signal V_L that causes the LED or laser diode to emit a pulse of light of duration T₀. Refer to the timing diagram at the bottom of Figure 2. At the instant of emission (t = 0) Vtx1 goes high turning S₁ ON. The other three signals are low keeping S₂, S₃, and S₄ OFF. At t = T_D the leading edge of the reflected pulse arrives, and the photodiode begins to generate current, building charge Q₁ and, thus, voltage V₁ on the capacitor C₁. At t = T₀, Vtx1 goes low turning S₁ OFF, while at the same time Vtx2 goes high turning S₂ ON. The signals Vtx3 and V_R remain low keeping S₃ and S₄ OFF. The photodiode continues to generate current, which now builds charge Q₂ and, thus, voltage V₂ on the capacitor C₂. At t = T₀ + T_D the trailing edge of the pulse arrives: the photodiode stops generating current and, therefore, Q₂ and V₂ have reached their final values even though S₂ remains ON. At t = 2T₀, Vtx2 goes low turning S₂ OFF, while at the same time Vtx3 goes high turning S₃ ON. The switch S₃ now holds the photodiode to V_dd, ensuring that (if present) both dark current and current due to background light are prevented from flowing to C₁ and C₂. The camera system now samples V₁ and V₂ and calculates the distance using:

Equation 1

R = \frac{1}{2} c T_{0} \frac{V_{2}}{V_{1} + V_{2}}

The capacitors C₁ and C₂ hold their charge until t = T_R when the signal V_R goes high causing S₄ to shunt and reset the capacitors. At t = T₁ the camera is ready for the next frame.

Suppose that between t = 2T₀ and t = T₁ the target has moved to a greater distance. The timing diagram shows that for this frame, the charge $Q_{1}^{'}$ collected on C₁ is less than Q₁ and the charge $Q_{2}^{'}$ on C₂ is greater than Q₂. The resulting smaller V₁ and larger V₂ imply a greater distance for this frame, as expected from Equation 1. Since the smallest value of V₁ is 0, the maximum distance that can be measured, R_MAX, is cT₀/2.

The duration of the pulse limits R_MAX. For a pulse with T₀ = 30 ns and air as the medium, R_MAX = 4.5 m. If the duration of a pulse is fixed at T₀, R_MAX can be extended by introducing a time delay τ between the instant the light pulse is emitted (t = 0) and the instant Vtx1 turns high (now at t = τ). Vtx2 turns high at t = T₀ + τ when Vtx1 turns low. Doing this extends the range in Equation 1 by cτ/2.

Figure 3. Timing diagram for two consecutive frames, each produced from three pulses of light. The target is stationary.

The amount of light in a returning reflected pulse depends on the amount of light in the emitted pulse, the type of medium in which the pulse propagates, distance to the target, and the color, orientation, and smoothness of the reflecting surface. The amount is generally too small to yield an accurate measurement; therefore, thousands of pulses of light may be used for a single frame. Figure 3 is a simplified timing diagram depicting signals Vtx1, Vtx2, Vtx3, and VR for two consecutive frames, each produced from three pulses of light.

Many of the potential applications of an I-TOF camera require that it is able to operate in full sunlight or indoor artificial lighting. Background light carries no information about the target, and if it contributes charge to C1 and C2, the resulting distance measurement will be erroneous. Using a narrow bandpass filter centered on the wavelength of the emitted pulses suppresses the background but does not completely eliminate it. Dark current, continuously generated by the photodiode, has a similar effect on the distance measurement as a non-varying background. Cooling the camera reduces dark current, but this approach may be impracticable. To alleviate the effects of background and dark current, the camera obtains sequentially pairs of frames: the first “light” frame results from pulsed light, background, and dark current, whereas the second “dark” frame results from background and dark current. Subtracting the “dark” frame from the “light” frame produces a “corrected” frame.

Figure 4. Example of an I-TOF camera output image with 3D information.

The top two panels in Figure 4 display scenes to be imaged with an I-TOF camera. In both scenes, the letters H, P, and K are 1.0 m, 1.75 m, and 2.5 m, respectively, from the camera, but the hand is not. In the left scene, the hand is in front and close to the letter H, whereas in the right scene, it is in front and close to the letter K. The panels in the middle row are the corresponding distance images acquired by a camera system using 10-μs pulses (3000 pulses per frame) from an 8X8 array of LEDs (λ=870 nm, FWHM = 45 nm), IR-transmission filter HOYA IR83N, f/1.2 lens (focal length 8 mm), and a 160 x 120 pixel² PMD imaging array (Hamamatsu S11963-01CR, pixel size 30 μm x 30 μm, FOV = 37.5° x 27.7°). The imaged distance is color-coded with blue corresponding to the farthest and red to the closest. The panels in the bottom row show that the color (distance) of the hands becomes redder (smaller) as they are moved closer to the camera. How small a movement can a camera detect?

Equation 2 shows that the uncertainty σ_R in the measured distance by a pixel increases linearly with T₀ and decreases as inverse square root with increasing signal-to-noise (S/N) ratio.

Equation 2

σ_{R} \propto \frac{\frac{1}{2} c T_{0}}{\sqrt{\frac{S}{N}}}

The noise N in the equation is a square-root of the sum in quadrature of the signal (photon) shot noise N_ss, dark current shot noise N_ds, background shot noise N_bs, and read noise N_r. Equation 2 assumes that the target is at R_MAX/2 so that the amount of charge accumulated by each capacitor is the same. If the photon shot noise is dominant, σ_R reduces to cT₀/4√ N_e, where N_e is the number of photoelectrons accumulated together by C₁ and C₂. Figure 5 is a plot of measured σ_R as a function of N_e. Here, the target is at R = R_MAX/2, T₀ = 30 ns, and there are 15,000 pulses per frame and no ambient light. The amplitude of the pulses is varied to achieve different values of N_e.

Figure 5. Distance uncertainty as a function of collected charge.

The plot shows that σ_R decreases with N_e as expected from Equation 2 and that the fractional error σ_R/R (R = 2.25 m) decreases from about 5.3% to about 0.44% as the collected signal increases from about 125 e- to about 275,000 e-. The more light there is, the better accuracy of the measured distance.

An achievable uncertainty of a few centimeters for a distance of a few meters is low enough for I-TOF cameras to find numerous practical applications. For example, the automotive industry has developed I-TOF camera systems that warn a driver (or take an independent action) about a possible frontal collision with an object such as another car or a pedestrian. Another use of I-TOF cameras is in robots that perform vision-based tasks in hazardous environments such as mines, mills, or manufacturing plants. Even in the entertainment industry, I-TOF cameras are used: video game developers have enhanced human-machine interaction in games requiring accurate distance information such as virtual baseball, boxing, or combat. By developing artificial and real time 3D vision, humans have finally caught up with what nature has been able to do for millions of years.

Note

A version of this article was published in the November 2014 issue of Photonics Spectra.

Application notes

Automotive

Spectroscopy

Flow cytometry

Technical notes

Detector Selection

Image Sensors

PMTs

MPPC SiPMs

Spectrometers

Ask an engineer

Detectors

Distance measurement & LiDAR

Imaging

Infrared Products

Light sources

Optical components

Quantum technologies

Spectroscopy

Drivers / Software

Publications

Brochures

Research papers

Newsletter archives

United States (EN)

Select your region or country.

Cameras for 3D imaging

Principles of operation

Note

1. What are cookies?

2. What are the different types of cookies?

3. How do we use cookies?

4. Which cookies do we use?

5. What are Internet tags and how do we use them with cookies?

6. Analytics and Advertisement Cookies