Data Format

Events

Event data is stored at VGA resolution (640×480) in compressed* h5 files. The structure of the h5 file is the following:

/events/p
/events/t
/events/x
/events/y
/ms_to_idx
/t_offset

/events/{p, t, x, y} contains the polarity, time, x (column idx), and y (row idx) coordinates. The time is in microseconds.
/t_offset is the time offset in microseconds that must be added to the timestamps of the events. By doing so, the event timestamps are in the same clock as the image timestamps.
/ms_to_idx is the mapping from milliseconds to event indices. It is used to efficiently retrieve event data within a time duration. It is defined such that
- t[ms_to_idx[ms]] >= ms*1000
- t[ms_to_idx[ms] - 1] < ms*1000
where ms is the time in milliseconds and t the event timestamps in microseconds. We provide python code that can be used to retrieve event data.

* The compression is performed using Blosc with the ZSTD codec. If you follow the install instructions, you should be able to directly read the data with h5py.

Event Rectification

Unlike image data, event data is not rectified or undistorted to simplify data storage. Rectified and undistorted event data can be computed using the rectify_maps.h5 file that is associated with each event h5 file. We provide example python code in the dataset directory for convenience.

The event data stored in the events.h5 file contains pixel coordinates as recorded by the sensor. Hence, this data is subject to lens distortion and not yet rectified. rectify_map contains the rectified pixel coordinates:

rectified_coordinates = rectify_map[y, x]
x_rectified = rectified_coordinates[..., 0]
y_rectified = rectified_coordinates[..., 1]

Images

Image data is available in 8-bit PNG files at a resolution of 1440×1080 and is already rectified.

Together with the images, we provide three timestamp files:

Exposure timestamps for both left and right camera: The start and end of the exposure time is provided in microseconds
Image timestamps: They are unified timestamps for both the left and right cameras and are computed as the average of the middle exposures from the left and right cameras. These timestamps are used to associate images with disparity maps.

Disparity

The disparity data is provided in the rectified coordinate frames of the left cameras.

Format Details

Disparity maps are saved as 1-channel 16-bit PNG files. We provide example python code for convenience.

A value of 0 indicates an invalid pixel where no ground truth exists. Otherwise, the disparity of valid pixels can be computed by converting the uint16 value to float and dividing it by 256:

disp[y,x]  = ((float)I[y,x])/256.0
valid[y,x] = I[y,x]>0

The reference view is the left event or RGB global shutter camera respectively. This is the same convention as the KITTI stereo benchmark.

Optical Flow

The optical flow data is provided in the rectified coordinate frame of the left event camera.

We provide, both forward and backward optical flow maps but only forward optical flow will be evaluated.

Optical flow data is provided at 10 Hz as displacement fields between two timestamps that are 100 milliseconds apart. E.g. if we have three timestamps t_{k-1}, t_k and t_{k+1} then the

forward optical flow with index k is the per-pixel displacement from t_k to t_{k+1}
backward optical flow with index k is the per-pixel displacement from t_k to t_{k-1}

You can find the optical flow timestamps in the optical flow timestamp files. For each row in this file, we have a corresponding forward or backward optical flow map.

Important note: Optical flow maps are not available for all sequences, and they are typically not provided for the whole sequence but only subsets of the sequences. You have to use the optical flow timestamp files to correctly associate sensor measurements (events and frames) to optical flow maps.

Format Details

Optical flow maps are saved as 3-channel 16-bit PNG files:

1st channel (R): x-component
2nd channel (G): y-component
3rd channel (B): 1 if valid and 0 otherwise

Note that the channel order (1, 2, 3) is (R, G, B). Due to this, we recommend loading the raw data with imageio instead of OpenCV (which uses 1,2,3 -> B,G,R order by default):

flow_16bit = imageio.imread(path_fo_flowfile, format='PNG-FI')

Finally, to convert the x-/y-flow into floating-point values, convert the value to float, subtract 2^15 and divide the result by 128 (flow values now have a range of [-256, 256]):

flow_x[y,x] = ((float)I[y,x,1]-2^15)/128.0
flow_y[y,x] = ((float)I[y,x,2]-2^15)/128.0 
valid[y,x] = (bool)I(y,x,3)

Semantic Segmentation

The semantic labels are stored in 8-bit greyscale PNG files at a resolution of 640 × 440 and are paired with event data captured by the left event camera. The semantic labels are generated by first warping the images from the left frame-based camera to the view of the left event camera. The last 40 rows are then cropped since the frame-based camera does not capture these regions. In a second step, a state-of-the-art semantic segmentation method is applied to the warped images to generate the final labels.

The names of the labels correspond to the names of the images from the left frame-based camera. Additionally, we provide a semantic timestamp file to associate events with semantic labels. The semantic timestamps are consistent with the corresponding image timestamps.

We provide two types of semantic labels with a different number of classes. The 11 class labels are consistent with the paper ESS: Learning Event-based Semantic Segmentation from Still Images [PDF ]. Whereas the 19 class labels are consistent with the Cityscapes labels for evaluation. The two sets of classes are as follows:

11 class: background, building, fence, person, pole, road, sidewalk, vegetation, car, wall, and traffic sign.
19 class: road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, person, rider, car, truck, bus, train, motorcycle, bicycle.

Please check the labels.py script for more details on the label mapping.

Camera Calibration File

Camera calibration data is summarized in the cam_to_cam.yaml. The naming convention is as follows:

Intrinsics

cam0: Event camera left
cam1: Frame camera left
cam2: Frame camera right
cam3: Event camera right
camRectX Rectified version of camX. E.g. camRect0 is the rectified version of cam0.

Extrinsics

T_XY: Rigid transformation that transforms a point in the camY coordinate frame into the camX coordinate frame.
R_rectX: Rotation that transforms a point in the camX coordinate frame into the camRectX coordinate frame.

Disparity to Depth

The following two quantities are perspective transformation matrices for reprojecting a disparity image to 3D space.

cams_03: Event cameras
cams_12: Frame cameras