Tutorial

3D Coordinate transformation for Visionary-S

Article No: KA-07826

Version: 1.1

Subject to change without notice

The SICK 3D Snapshot device (Visionary devices) have in common that they produce data that can be transformed into 3D point cloud format. However, this com

Introduction

The SICK 3D Snapshot device (Visionary devices) have in common that they produce data that can be transformed into 3D point cloud format. However, this computation differs between all these technologies. This document explains how to transform the distance information from the stereo camera Visionary-S into 3D point cloud data.

General concept

This chapter gather well know imaging concepts that we will follow in order to create 3D point clouds. If you are more interested in how the transformation has to be computed, have a look at the section "From Z-map to point cloud". If you are interested in how to use the example code provided with the camera, please have a look at the section "Using the programing examples".

Image data

The Visionary-S combines both color and depth perception.

Both depth and color information, plus information about the validity of a pixel are provided by the Visionary-S as three distinctive maps.

For a detailed description, how to connect with the Visionary-S and how to grab the data, please have a look at the content of the USB drive, which is delivered together with the Visionary-S.

To keep things simple, we will concentrate on the distance values first. See the following figure for an illustration of the pixel matrix. Note that the column and row indices can be named x resp. y and col resp. row. We use both nominations.

Camera and local coordinate system

The camera coordinate system (sensor coordinate system) is defined as a right-handed system with Z_C coordinate increasing along the axis from the back to the front side of the sensor, Y_C coordinate increasing vertically upwards and X_C coordinate increasing horizontally to the left, all from the point of view of the sensor(or someone standing behind it). The origin of this coordinate system is the focal point of the image sensor and its z-axis is coincidental with the optical axis.

Derived from this is the local coordinate system. The origin of the coordinate system (0, 0, 0) is a specific reference point, specifically the center of the front face of the camera. To convert from camera to local coordinate system, specific offsets and rotations need to be applied. The local coordinate system is indicated by the index 'L' (x_L, y_L, z_L).

Intrinsic parameters

The intrinsic parameters describe how the data is mapped from the imager chip into the camera coordinate system. Those parameters describe the optical properties of the imaging system. They do not change as long as the optics are fixed, which is the case for our sensors. As those parameters include a mapping from pixel indices (pixel positions on the imager chip) into a metric coordinate system, we have to define the values either in metric units or in relation to the pixel indices.
We use the standard pin-hole camera model as used in OpenCV:camera calibration and 3d reconstruction and described in Szeliski, Richard:Computer Vision Algorithms and Applications. The values provided are the coordinates (cx, cy) of the principal point in pixels and the focal lengths fx, fy in pixels.
Real lenses have distortions values that need to be compensated. For this reason up to 3 radial distortion coefficients k1 ... k3 and 2 tangential distortion coefficients p1 ... p2 are provided. The exact number differs between different sensor devices, unused coefficents are always 0. The used model is compatible to the one used by the camera calibration of OpenCV.

Extrinsic parameters

The extrinsic parameters or extrinsic matrix of a sensor describes its pose in the 3D world space. Pose includes position and orientation. We use the term extrinsic matrix in its original mathematical meaning as the transformation matrix from world space to camera space. We refer to a clear description made by Kyle Simek.
As the sensor is used in real world applications, the operator should be able to reconsider the camera parameters. We assume that the pose of the sensor is easy to reconsider during the setup. Hence, we use a "camera-centric" parameterization, which describes how the camera changes relative to the world. These parameters correspond to elements of the inverse extrinsic camera matrix which we call CameraToWorldTransform (in the BLOB metadata XML description) or resp. CameraToWorldMatrix:

Note that R_C is a 3x3 rotation matrix that describes the sensor orientation in world coordinates and C = (C_X, C_Y, C_Z) is the 3D translation vector describing the sensor position.
We further define the following:

n Rotation occurs about the camera's position (more precisely the center of the front face of the sensor which is the origin of the local coordinate system).
n Rotation occurs in a mathematical positive sense (counterclockwise).
n Translation and rotation is executed in the following order:

Sensor is in the world origin, axes are aligned with world axes (see Figure 3.8)
Yaw (Rotate around sensor Z_L-axis)
Pitch (rotate around sensor Y_L-axis)
Roll (rotate around sensor X_L-axis)
Apply the translation (add the translation vector to the rotated position)

For the Visionary-S the standard bracket (article no 2077710; illustrated in the figure below) allows a rotation of the device around an axis parallel to the X_L-axis. As this rotation axis is parallel but has not the same origin, the physical rotation leads to a change of the device position.

Internally, we only store the CameraToWorldMatrix. This leads to the effect that the mounting orientation angles are not unambiguous. If the user stores a pose in the device, it is stored in the CameraToWorldMatrix. If the orientation angles are loaded from the device, they are computed from the CameraToWorldMatrix. The result of this computation for the three angles might be different from what the user has defined before, however the pose remains the same. It is possible to create a specific orientation by various combinations of rotations.

World coordinate system

The world coordinate system is defined by the user and it is the one the data is visualized in the 3D viewer. We assume that usually the floor or conveyor belt corresponds to the xy-plane and hence the floor normal points in z-direction (see figure below).

The home position of the sensor is aligned with the world coordinate system so that the viewing direction is along the z-axis. However, the front plane cannot practically be aligned with the xy-plane so that there is an offset of the size of the sensor housing to the local coordinate system and half of the housing to the camera coordinate system.

Vehicle coordinate system

If we know that the device will be attached to some vehicle, we introduce another coordinate system as the pose in the world coordinate system is variable. We call this system vehicle coordinate system (X_V, Y_V, Z_V) and we follow the definitions from Wikipedia Axes conventions/conventions for land vehicles. In the application the pose of the sensor will usually be fixed in relation to the vehicle coordinate system and variable in the world coordinate system. If the vehicle knows its position in the world coordinate system, it can send this information to the sensor so that it can provide a 3D point cloud in world coordinates as output.

Transformations

Changing the transformation direction

If we have a world to sensor transformation matrix Mc →w (as contained in the camera model), and want to generate world-coordinates from a sensor coordinate point, we need to calculate the inverse of the Mc→ w.
If Mc → w=

then the inverse (Mc → w)^-1 is

where R^T is the transposed matrix of R.

Composition of transformations

If we have two transformation T1 and T2 and want to apply T2 after T1 has been applied, the resulting transformation T can be calculated as T = T2 x T1 where x is the matrix product. (Note that the later transformation is the first factor).

From Z-map to point cloud

A stereo image camera contains two synchronized imagers that capture the same scenery using the same exposure settings at the same point in time. In the following text the two acquired images are called left and right image. The stereo imager uses the effect that the displacement between the same point in the left and the right image is a measure for the z-distance of the point to the imagers. This displacement is called the disparity of the point.
To obtain this disparities in both images first the lens distortion is corrected. Then a common reference system is used and both images are converted to this reference.
In our stereo camera two references are used, depending on how the color information and the distance information are mapped to each other:

if the color is mapped onto the distance, the focal point and axis of the left imager is used as reference,

if the distance is mapped onto the color image, the focal point and axis of the color imager is used.

The offset between these two reference points will be corrected using the transformations in the next chapter. Keep in mind, that using the distance maps without this corrections will lead to a visible X shift in the data when toggling the RGB/3D priority modes.
Both left, right and color image are projected into the new reference system. Using the projected left and right images a new image is generated that contains the z-distances calculated from the disparities between the left and right images.
This z image follows a hole camera model whose parameters are those of the reference system. Since the images already are lens-corrected, we do not need any lens correction parameters. The following section explains how to transform the z image back into 3D point coordinates.

Using streamed data

The BLOB XML description section contains the following values:

the 4x4 camera to world matrix Mc →w
(in 2D row-major layout, see memory layout of multi-dimensional arrays)
the intrinsic matrix values (cx, cy, fx, fy) of the reference system.

Further we have the Z-map Z of size nrows x ncols. The maps are in 2D row-major layout as well.
To calculate the camera coordinates (x_C, y_C, z_C) of the point pC at index (row, col) (row: 0...nrows, col: 0...ncols) the following code can be used.

Distance map to local point coordinates

const double xp = (cx - col) / fx;
const double yp = (cy - row) / fy;
 
const double zc = zmap[row][col];               // zmap is the Z-map
const double xc = xp * zc;
const double yc = yp * zc;

Now we need to transform from camera coordinates to world coordinates. The camera to world matrix also includes our camera-to-local transformation, so this means our point in world coordinates (x_W, y_W, z_W) is p_W= Mc → w * pC.

Sensor to world coordinate system

const double xw = m_c2w[0][3] + zc * m_c2w[0][2]  + yc * m_c2w[0][1] + xc * m_c2w[0][0];
const double yw = m_c2w[1][3] + zc * m_c2w[1][2]  + yc * m_c2w[1][1] + xc * m_c2w[1][0];
const double zw = m_c2w[2][3] + zc * m_c2w[2][2]  + yc * m_c2w[2][1] + xc * m_c2w[2][0];

If we have not set mounting settings in the HMI or via API, we can omit the (very small) tilt and rotation correction angles for the imager and only concentrate on the displacement relative to the focal point (that are stored in the last column (column 3 if we use a zero based index) of the camera to world matrix.

Simplified sensor to world coordinate system

const double xw = m_c2w[0][3] + xc;
const double yw = m_c2w[1][3] + yc;
const double zw = m_c2w[2][3] + zc;

Measuring in the Z-map

When using the distance map directly for measuring, be aware that there are some effects that you need to consider.

First: the Z-map is not centered around the devices reference point, it is centered around its focus point OC (xC, yC, zC). This point varies depending on whether you use "color to z" or "z to color" mapping. To obtain the point that represents this focus point, you can use the the translation information from the device reference point (that is the origin of the local coordinate system - for details see the section about the local coordinate system in 3D coordinate transformation) to the focus point OC. This translation is stored in the last column of the camera to world transformation matrix.

Reference point to center of Z

const double dx = m_c2w[0][3];
const double dy = m_c2w[1][3];
const double dz = m_c2w[2][3];

What can we do with this information?
If we subtract dz from the Z values in the z-map, we get the distance from the device reference point to the object in the scene. The tuple (dx, dy) where real optical axis (and thus the center of the Z-map) is relative to the device reference point. As an approximation:

if the color to Z mapping was selected, the optical axis is that of the left image sensor
same is true if no mapping was selected
if the Z to color mapping was selected, the optical axis is that of the color image sensor

Second: there is an additional error though due to the fact that the intersection point (the principal point) of optical axis of the lens and the image plane of the imager is not perfectly above the center of the image. Due to manufacturing tolerance there is a small variance, that is published via the cx and cy values of the intrinsic matrix. This effect will result in a lateral error that grows linear with distance and will be approximately 10mm at 2.5m distance.
To summarize, measuring using the Z-map directly can be used for very fast checks when precision can be sacrificed for speed. In this case the translation between reference point and focal point needs to be considered to interpret the measurements in a sensible way. Only information about the distance between the imager plane and an object (the z coordinate) is readily available. Lateral information (x, y) has the lateral error described above and needs to be interpreted carefully since due to perspective objects further away of course appear smaller than those closer to the camera.

Using the programing examples

Helpful samples

For convenience SICK provides programing examples for streaming devices, which contain code to connect, configure and use the camera. They can be found in the download section of the product on sick.com.

Using C++ to compute point cloud data

An easy and straight forward to present 3D point cloud conversion are the Python samples.

In Data.py, the image data and the important XML header containing the camera parameters for the correct camera to world transformation is extracted:

How the binary part of the blob is extracted can be seen here:

Having extracted the XML information, the calcualtion of the point cloud can be done as shown below. The code below was written to work with both the Visionary-S and the Visionary-T-Mini an can be found in PointCloud.py

The calculation also includes the transformation from sensor to world coordinate system using the camera-to-world matrix c2w which is alos extracted when parsing the XML header of the blob.

Keywords:
Visionary-S, How to, Tutorial, PointCloud, Coordinate System, Coordinate, 3D Camera, 3D Cameras, Transformation, Conversion

Create request

3D Coordinate transformation for Visionary-S

Related Products

Table of Contents

Introduction

General concept

Image data

Camera and local coordinate system

Intrinsic parameters

Extrinsic parameters

World coordinate system

Vehicle coordinate system

Transformations

Changing the transformation direction

Composition of transformations

From Z-map to point cloud

Using streamed data

Distance map to local point coordinates

Sensor to world coordinate system

Simplified sensor to world coordinate system

Measuring in the Z-map

Using the programing examples

Helpful samples

Using C++ to compute point cloud data

Was this article helpful?

3D Coordinate transformation for Visionary-S

Related Products

Table of Contents

From Z-map to point cloud

Using streamed data

Distance map to local point coordinates

Sensor to world coordinate system

Simplified sensor to world coordinate system

Measuring in the Z-map

Using the programing examples

Thanks for your feedback!

Was this article helpful?