Events
Event data is stored at VGA resolution (640×480) in compressed* h5 files. The structure of the h5 file is the following:
/events/p
/events/t
/events/x
/events/y
/ms_to_idx
/t_offset
/events/{p, t, x, y}
contains the polarity, time, x (column idx), and y (row idx) coordinates. The time is in microseconds./t_offset
is the time offset in microseconds that must be added to the timestamps of the events. By doing so, the event timestamps are in the same clock as the image timestamps./ms_to_idx
is the mapping from milliseconds to event indices. It is used to efficiently retrieve event data within a time duration. It is defined such thatt[ms_to_idx[ms]] >= ms*1000
t[ms_to_idx[ms] - 1] < ms*1000
- where
ms
is the time in milliseconds andt
the event timestamps in microseconds. We provide python code that can be used to retrieve event data.
* The compression is performed using Blosc with the ZSTD codec. If you follow the install instructions, you should be able to directly read the data with h5py.
Event Rectification
Unlike image data, event data is not rectified or undistorted to simplify data storage. Rectified and undistorted event data can be computed using the rectify_maps.h5
file that is associated with each event h5 file. We provide example python code in the dataset directory for convenience.
The event data stored in the events.h5
file contains pixel coordinates as recorded by the sensor. Hence, this data is subject to lens distortion and not yet rectified. rectify_map
contains the rectified pixel coordinates:
rectified_coordinates = rectify_map[y, x]
x_rectified = rectified_coordinates[..., 0]
y_rectified = rectified_coordinates[..., 1]
Images
Image data is available in 8-bit PNG files at a resolution of 1440×1080 and is already rectified.
Together with the images, we provide three timestamp files:
- Exposure timestamps for both left and right camera: The start and end of the exposure time is provided in microseconds
- Image timestamps: They are unified timestamps for both the left and right cameras and are computed as the average of the middle exposures from the left and right cameras. These timestamps are used to associate images with disparity maps.
Disparity
The disparity data is provided in the rectified coordinate frames of the left cameras.
Format Details
Disparity maps are saved as 1-channel 16-bit PNG files. We provide example python code for convenience.
A value of 0 indicates an invalid pixel where no ground truth exists. Otherwise, the disparity of valid pixels can be computed by converting the uint16 value to float and dividing it by 256:
disp[y,x] = ((float)I[y,x])/256.0
valid[y,x] = I[y,x]>0
The reference view is the left event or RGB global shutter camera respectively. This is the same convention as the KITTI stereo benchmark.
Optical Flow
The optical flow data is provided in the rectified coordinate frame of the left event camera.
We provide, both forward and backward optical flow maps but only forward optical flow will be evaluated.
Optical flow data is provided at 10 Hz as displacement fields between two timestamps that are 100 milliseconds apart. E.g. if we have three timestamps t_{k-1}
, t_k
and t_{k+1}
then the
- forward optical flow with index
k
is the per-pixel displacement fromt_k
tot_{k+1}
- backward optical flow with index
k
is the per-pixel displacement fromt_k
tot_{k-1}
You can find the optical flow timestamps in the optical flow timestamp files. For each row in this file, we have a corresponding forward or backward optical flow map.
Important note: Optical flow maps are not available for all sequences, and they are typically not provided for the whole sequence but only subsets of the sequences. You have to use the optical flow timestamp files to correctly associate sensor measurements (events and frames) to optical flow maps.
Format Details
Optical flow maps are saved as 3-channel 16-bit PNG files:
- 1st channel (R): x-component
- 2nd channel (G): y-component
- 3rd channel (B): 1 if valid and 0 otherwise
Note that the channel order (1, 2, 3) is (R, G, B). Due to this, we recommend loading the raw data with imageio instead of OpenCV (which uses 1,2,3 -> B,G,R order by default):
flow_16bit = imageio.imread(path_fo_flowfile, format='PNG-FI')
Finally, to convert the x-/y-flow into floating-point values, convert the value to float, subtract 2^15
and divide the result by 128
(flow values now have a range of [-256, 256]
):
flow_x[y,x] = ((float)I[y,x,1]-2^15)/128.0
flow_y[y,x] = ((float)I[y,x,2]-2^15)/128.0
valid[y,x] = (bool)I(y,x,3)
Semantic Segmentation
The semantic labels are stored in 8-bit greyscale PNG files at a resolution of 640 × 440 and are paired with event data captured by the left event camera. The semantic labels are generated by first warping the images from the left frame-based camera to the view of the left event camera. The last 40 rows are then cropped since the frame-based camera does not capture these regions. In a second step, a state-of-the-art semantic segmentation method is applied to the warped images to generate the final labels.
The names of the labels correspond to the names of the images from the left frame-based camera. Additionally, we provide a semantic timestamp file to associate events with semantic labels. The semantic timestamps are consistent with the corresponding image timestamps. ​
​We provide two types of semantic labels with a different number of classes. The 11 class labels are consistent with the paper ESS: Learning Event-based Semantic Segmentation from Still Images [PDF]. Whereas the 19 class labels are consistent with the Cityscapes labels for evaluation.​ The two sets of classes are as follows:
11 class: background, building, fence, person, pole, road, sidewalk, vegetation, car, wall, and traffic sign.
19 class: road, sidewalk, building, wall, fence, pole, traffic light, traffic sign, vegetation, terrain, sky, person, rider, car, truck, bus, train, motorcycle, bicycle.
Please check the labels.py script for more details on the label mapping.
Camera Calibration File
Camera calibration data is summarized in the cam_to_cam.yaml
. The naming convention is as follows:
Intrinsics
cam0
: Event camera leftcam1
: Frame camera leftcam2
: Frame camera rightcam3
: Event camera rightcamRectX
Rectified version of camX. E.g. camRect0 is the rectified version of cam0.
Extrinsics
T_XY
: Rigid transformation that transforms a point in the camY coordinate frame into the camX coordinate frame.R_rectX
: Rotation that transforms a point in the camX coordinate frame into the camRectX coordinate frame.
Disparity to Depth
The following two quantities are perspective transformation matrices for reprojecting a disparity image to 3D space.
cams_03
: Event camerascams_12
: Frame cameras