web
You’re offline. This is a read only version of the page.
close
Support Portal

3D Coordinate transformation for Visionary-T Mini

The SICK 3D Snapshot device (Visionary devices) have in common that they produce data that can be transformed into 3D point cloud format. However, this com
Related Products
V3S105-1AAAAAA VISION SENSOR 3DSNAP.
V3S145-1AAAAAA VISION SENSOR 3DSNAP.

Table of Contents

Introduction

The SICK 3D Snapshot device (Visionary devices) have in common that they produce data that can be transformed into 3D point cloud format. However, this computation differs between all these technologies. This document explains how to transform the distance information from the ToF camera Visionary-T Mini into 3D point cloud data.
 

General concept

This chapter gather well know imaging concepts that we will follow in order to create 3D point clouds. If you are more interested in how the transformation has to be computed, have a look at the section "From Z-map to point cloud". If you are interested in how to use the example code provided with the camera, please have a look at the section "Using the programing examples".

Image data


The Visionary-T combines depth and intensity perception. Both depth and intensity information are provided by the Visionary-T Mini as two distinctive maps.



For a detailed description, how to connect with the Visionary-T Mini and how to grab the data, please have a look at the technical documentation which is provided in the product download section on sick.com.



To keep things simple, we will concentrate on the distance values. See the following figure for an illustration of the pixel matrix. Note that the column and row indices can be named x resp. y and col resp. row. We use both nominations.


 

Camera and local coordinate system


The camera coordinate system (sensor coordinate system) is defined as a right-handed system with ZC coordinate increasing along the axis from the back to the front side of the sensor, YC coordinate increasing vertically upwards and XC coordinate increasing horizontally to the left, all from the point of view of the sensor(or someone standing behind it). The origin of this coordinate system is the focal point of the image sensor and its z-axis is coincidental with the optical axis.

 

Derived from this is the local coordinate system. The origin of the coordinate system (0, 0, 0) is a specific reference point, specifically the center of the front face of the camera. To convert from camera to local coordinate system, specific offsets and rotations need to be applied. The local coordinate system is indicated by the index 'L' (xL, yL, zL).
 

Intrinsic parameters


The intrinsic parameters describe how the data is mapped from the imager chip into the camera coordinate system. Those parameters describe the optical properties of the imaging system. They do not change as long as the optics are fixed, which is the case for our sensors. As those parameters include a mapping from pixel indices (pixel positions on the imager chip) into a metric coordinate system, we have to define the values either in metric units or in relation to the pixel indices.
We use the standard pin-hole camera model as used in OpenCV:camera calibration and 3d reconstruction and described in Szeliski, Richard:Computer Vision Algorithms and Applications. The values provided are the coordinates (cx, cy) of the principal point in pixels and the focal lengths fx, fy in pixels.
Real lenses have distortions values that need to be compensated. For this reason up to 3 radial distortion coefficients k1 ... k3 and 2 tangential distortion coefficients p1 ... p2 are provided. The exact number differs between different sensor devices, unused coefficents are always 0. The used model is compatible to the one used by the camera calibration of OpenCV.
 

Extrinsic parameters


The extrinsic parameters or extrinsic matrix of a sensor describes its pose in the 3D world space. Pose includes position and orientation. We use the term extrinsic matrix in its original mathematical meaning as the transformation matrix from world space to camera space. We refer to a clear description made by Kyle Simek.
As the sensor is used in real world applications, the operator should be able to reconsider the camera parameters. We assume that the pose of the sensor is easy to reconsider during the setup. Hence, we use a "camera-centric" parameterization, which describes how the camera changes relative to the world. These parameters correspond to elements of the inverse extrinsic camera matrix which we call CameraToWorldTransform (in the BLOB metadata XML description) or resp. CameraToWorldMatrix:



Note that RC is a 3x3 rotation matrix that describes the sensor orientation in world coordinates and C = (CX, CY, CZ) is the 3D translation vector describing the sensor position.
We further define the following:

  • n  Rotation occurs about the camera's position (more precisely the center of the front face of the sensor which is the origin of the local coordinate system).
  • n  Rotation occurs in a mathematical positive sense (counterclockwise).
  • n  Translation and rotation is executed in the following order:
  1. Sensor is in the world origin, axes are aligned with world axes (see Figure 3.8)
  2. Yaw (Rotate around sensor ZL-axis)
  3. Pitch (rotate around sensor YL-axis)
  4. Roll (rotate around sensor XL-axis)
  5. Apply the translation (add the translation vector to the rotated position)





Internally, we only store the CameraToWorldMatrix. This leads to the effect that the mounting orientation angles are not unambiguous. If the user stores a pose in the device, it is stored in the CameraToWorldMatrix. If the orientation angles are loaded from the device, they are computed from the CameraToWorldMatrix. The result of this computation for the three angles might be different from what the user has defined before, however the pose remains the same. It is possible to create a specific orientation by various combinations of rotations.

World coordinate system


The world coordinate system is defined by the user and it is the one the data is visualized in the 3D viewer. We assume that usually the floor or conveyor belt corresponds to the xy-plane and hence the floor normal points in z-direction (see figure below).



The home position of the sensor is aligned with the world coordinate system so that the viewing direction is along the z-axis. However, the front plane cannot practically be aligned with the xy-plane so that there is an offset of the size of the sensor housing to the local coordinate system and half of the housing to the camera coordinate system.


 

Vehicle coordinate system


If we know that the device will be attached to some vehicle, we introduce another coordinate system as the pose in the world coordinate system is variable. We call this system vehicle coordinate system (XV, YV, ZV) and we follow the definitions from Wikipedia Axes conventions/conventions for land vehicles. In the application the pose of the sensor will usually be fixed in relation to the vehicle coordinate system and variable in the world coordinate system. If the vehicle knows its position in the world coordinate system, it can send this information to the sensor so that it can provide a 3D point cloud in world coordinates as output.


 

Transformations

 

Changing the transformation direction


If we have a world to sensor transformation matrix Mc → w (as contained in the camera model), and want to generate world-coordinates from a sensor coordinate point, we need to calculate the inverse of the Mc → w.
If Mc → w is



then the inverse (Mc → w)-1 is 

 


where RT is the transposed matrix of R.
 

Composition of transformations


If we have two transformation T1 and T2 and want to apply T2 after T1 has been applied, the resulting transformation T can be calculated as T = T2 x T1 where x is the matrix product. (Note that the later transformation is the first factor).

 

From Z-map to point cloud

A time-of-flight camera sends out a modulated light signal that is captured by an imager chip. Those incoming signals that are sampled in each pixel so that the raw data consists of an amplitude value that refers to the amount of incoming light and a phase value that is proportional to the distance. Hence, we can compute a matrix holding spherical distance values. Spherical means that those values are the distance of the captured scene point to the camera center. The question is now how these values can be used to estimate a 3D point cloud.
By using a standard imager chip the camera follows the same principles as 2D cameras do. Hence, we can apply the general concepts we know from 2D to the sensor and exploit the fact that we also have distance values per pixel.

 

Using streamed data


The BLOB XML description section contains the following values:

  • the 4x4 camera to world matrix Mc → w
  • the offset along the z axis of the camera reference point to the focus point f2rc
  • the intrinsic matrix values (cx, cy, fx, fy)
  • the lens correction parameters (radial distortion coefficients k1, k2)

Further we have the distance map D of size nrows x ncols. To calculate the local coordinates (xL, yL, zL) of the point at index (row, col) (row: 0...nrows, col: 0...ncols) the following code can be used.
 

Distance map to local point coordinates 

 

const double xp = (cx - col) / fx;
const double yp = (cy - row) / fy;

const double r2 = (xp*xp + yp*yp);
const double r4 = r2*r2;

const double k = 1.0 + k1*r2 + k2*r4;

const double xd = xp * k;
const double yd = yp * k;

const double dist = d[row][col];
const double s0 = sqrt(xd*xd + yd*yd + 1.0);

const double xl = xd * dist / s0;
const double yl = yd * dist / s0;
const double zl = dist / s0 - f2rc;

Using the programing examples

Helpful samples


For convenience SICK provides programing examples for streaming devices, which contain code to connect, configure and use the camera. They can be found in the download section of the product on sick.com.

Using Python to compute point cloud data

An easy and straight forward to present 3D point cloud conversion are the Python samples.

 

In Data.py, the image data and the important XML header containing the camera parameters for the correct camera to world transformation is extracted:

 

How the binary part of the blob is extracted can be seen here:

 

Having extracted the XML information, the calcualtion of the point cloud can be done as shown below. The code below was written to work with both the Visionary-S and the Visionary-T-Mini an can be found in PointCloud.py

 

 

The calculation also includes the transformation from sensor to world coordinate system using the camera-to-world matrix c2w which is alos extracted when parsing the XML header of the blob. 

 


 
Keywords:
Visionary-T Mini, How to, Tutorial, PointCloud, Coordinate System, Coordinate, 3D Camera, 3D Cameras, Transformation, Conversion