Training Data Overview

The dataset we are providing will be made up of 15x 100-frames sequences of stereo camera images acquired from a da Vinci Xi robot during porcine partial nephrectomies. To avoid redundancy, we sample the frames from 30 Hz video at 2 Hz. To extract the 1280x1024 camera images from the video frames, crop the image from the pixel (320, 28). 

In each frame we have hand labelled the visible border of the kidney, skipping regions where it is fully occluded by fat and other tissue. The only exception to this rule is instrument occlusion where we provide a separate labelling and interpolate over the occlusion.

Example Frames

Test Data and Evaluation Overview

The test set will consist of 15x 50-frames sequences containing footage sampled immediately after each training sequence and 5 full 150-frames sequences. These sequences will be sampled at the same rate as the training set.

Participants will be evaluated on each test set separately. If a machine learning approach is taken to the problem, participants should exclude the corresponding training set when evaluating on one of the 50-frames sequences. This is to avoid bias in the training.

We want users to predict a 1 pixel wide contour around the edge of the kidney. This should ignore other objects in the image and regions where the kidney boundary is occluded by another obejct such as an instrument.

Submissions will be compared with hand-labelled ground truth. Scoring is assessed with a Euclidean distance transform and 2 scores will be provided for each frame. One will be a 'precision-like' score, where the scoring function will be:

where s is the score for an image, C_{gt} is the set of ground truth contour pixels, I_{u} is the user supplied image, which is indexed by pixel locations 'i'. The function d(.) returns the Euclidean distance from the pixel 'i' in I_{u} to the nearest contour point in I_{u}.

The second scoring function is similar and instead assesses a 'recall-like' score, where the scoring function will be:

where s is the score for an image, C_{u} is the set of user supplied contour pixels and I_{gt} is the ground truth image. The other terms have the same meaning as in the above equation.

Trouble extracting contours?

If the training data has wider contours and your method requires exactly 1 pixel wide contours to train, you can use the following Python code to skeletonize the contours down to a 1 pixel width. 

>>> import cv2

>>> from skimage import morphology

>>> im = cv2.imread("kidney_image.png",0)

>>> skeleton_ground_truth = morphology.skeletonize((im == 255)*1)

>>> skeleton_binary = (skeleton_ground_truth > 0) * 255

This will extract the boundary for the kidney (label 255) rather than the instrument (label 10). You can combine them together to get clean boundaries for both.