🐈 Introducing the KITTI 3D format and its import/export in BasicAI LiDAR Fusion annotation tool.

The KITTI format is widely used in autonomous driving for computer vision tasks, such as 3D object detection, multi-object tracking, and scene understanding.

For more information, please visit the KITTI site.

On the BasicAI annotation platform, you can import and export data in KITTI format for 3D object detection, which will be detailed in this article.

You can download KITTI datasets from the official website or find samples here📂.

Import KITTI to BasicAI

Currently, BasicAI offers only KITTI 3D object detection for single data. The required data is marked in red below.

KITTI datasets, 3D lidar fusion data

When uploading to BasicAI, the data should be organized in a ZIP archive with the standard KITTI format:

├── calib // Camera calibration parameters
│    ├── 0000.txt
│    ├── 0001.txt
│    ...
├── image_2 // Left color images
│    ├── 0000.png
│    ├── 0001.png
│    ...
├── label_2 // Label files for the left color images (optional)
│    ├── 0000.txt
│    ├── 0001.txt
│    ...
├── velodyne // Lidar point cloud
│    ├── 0000.bin
│    ├── 0001.bin
│    ...


Please adhere strictly to the folder names and file types provided. We will explain each of them below.

Click to expand details:

📄 calib (.txt)

In 2D and 3D sensor fusion annotation, camera calibration parameters are required to convert the point cloud coordinates into the camera coordinates.

recording platform of kitti datasets

Recording Platform

Here is a sample text file of camera calibration parameters:

P0: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 0.000000000000e+00 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P1: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.875744000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 0.000000000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 0.000000000000e+00
P2: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 4.485728000000e+01 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.163791000000e-01 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.745884000000e-03
P3: 7.215377000000e+02 0.000000000000e+00 6.095593000000e+02 -3.395242000000e+02 0.000000000000e+00 7.215377000000e+02 1.728540000000e+02 2.199936000000e+00 0.000000000000e+00 0.000000000000e+00 1.000000000000e+00 2.729905000000e-03
R0_rect: 9.999239000000e-01 9.837760000000e-03 -7.445048000000e-03 -9.869795000000e-03 9.999421000000e-01 -4.278459000000e-03 7.402527000000e-03 4.351614000000e-03 9.999631000000e-01
Tr_velo_to_cam: 7.533745000000e-03 -9.999714000000e-01 -6.166020000000e-04 -4.069766000000e-03 1.480249000000e-02 7.280733000000e-04 -9.998902000000e-01 -7.631618000000e-02 9.998621000000e-01 7.523790000000e-03 1.480755000000e-02 -2.717806000000e-01
Tr_imu_to_velo: 9.999976000000e-01 7.553071000000e-04 -2.035826000000e-03 -8.086759000000e-01 -7.854027000000e-04 9.998898000000e-01 -1.482298000000e-02 3.195559000000e-01 2.024406000000e-03 1.482454000000e-02 9.998881000000e-01 -7.997231000000e-01
🏙️ image_2 (.png)

BasicAI currently supports only ONE camera for KITTI and the folder name should be image_2.

In total, the KITTI dataset includes 4 camera data, including 2 grayscale cameras and 2 color cameras. The RGB image data captured by the left color camera is stored in image_2.

kitti format datasets
🏷 label_2 (.txt)

If your data has been pre-annotated, you can upload the label file to BasicAI.


In order to correspond to the image_2 folder, the label file needs to be named label_2.

It is a text file containing a set of lines, with each line representing the annotation for a single object in the corresponding image. The format of each line is as follows:

<class names> <truncation> <occlusion> <alpha> <bbox coordinates> <3D dimensions> <location> <rotation_y> <score>

Here is a description of these fields:

elements parameter name
1 class names
the class or type of the annotated object. this can be one of the following: 'car', 'van', 'truck', 'pedestrian',
'person_sitting', 'cyclist', 'tram', 'misc', or 'dontcare'. 'dontcare' is used for objects that are present but ignored
for evaluation.
1 truncation
the fraction of the object that is visible. float from 0 (non-truncated) to 1 (truncated), where truncated refers to
the object leaving image boundaries.
1 occlusion integer (0,1,2,3) indicating occlusion state: [ 0 = fully visible, 1 = partly visible, 2 = largely occluded, 3 = unknown] 2
1 alpha
observation angle of object, ranging [-pi..pi]
4 bounding box
2d bounding box of object in the image (0-based index): <left>, <top>, <right>, <bottom> pixel coordinates
100, 120, 180, 160
3 3d dimension 3d object dimensions: <height>, <width>, <length> (in meters) 1.65, 1.67, 3.64
3 location 3d object location <x>, <y>, <z> in camera coordinates (in meters)
-0.65, 1.71, 46.7
1 rotation_y rotation ry around the y-axis in camera coordinates, ranging [-pi..pi]
1 score only for results: float, indicating confidence in detection, needed for p/r curves, higher is better 1

The sum of the total number of elements per object is 16. Here is a sample text file:

Car 0.00 0 1.51 896.18 505.20 1041.40 648.19 1.74 1.77 4.13 0.94 0.89 14.01 1.58 1
Cyclist 0.00 0 -2.46 665.45 160.00 717.93 217.99 1.72 0.47 1.65 2.45 1.35 22.10 -2.35 1
Pedestrian 0.00 2 0.21 423.17 173.67 433.17 224.03 1.60 0.38 0.30 -5.87 1.63 23.11 -0.03 1
None 0.00 0 -2.17 819.25 495.83 935.81 590.23 2.04 0.72 3.73 -0.54 0.79 22.48 -2.19 1
Car 0.00 0 -0.18 335.98 444.21 646.47 572.93 2.31 3.63 5.40 -8.63 0.08 19.88 -0.59 1
🌌 velodyne (.bin)

LiDAR point cloud data captured by the Velodyne laser scanner is stored in a .bin file.

After formatting data in standard KITTI, you can upload it to the LiDAR Fusion datasets on BasicAI. Remember to click the Config switch button, select the format as KITTI 3D Object Detection, and choose whether to import pre-annotations.

For more details about creating datasets and uploading data, please refer to Data and Folder Structure.

upload kitti pre-labeled data to BasicAI annotation platform

Access the LiDAR Fusion Tool to annotate point cloud and image data. You can load pre-annotations from the top right corner.

label kitti datasets on BasicAI annotation platform

Export KITTI from BasicAI

You can export annotation results in the KITTI 3D Object Detection format from BasicAI.

For more details about exporting, please refer to Export.

export kitti annotations from BasicAI annotation platform

For export of results:

  1. In KITTI format, only 2D bounding boxes on camera image data can be exported.

  2. When annotating in multi-camera fused angles, multiple label files will be exported for each camera in a downloaded .zip archive with the following structure:

    ├── label_0 // Annotations on camera image 0
    │    └── 0000.txt
    ├── label_1 // Annotations on camera image 1
    │    └── 0000.txt
    ├── label_2 // Annotations on camera image 2
    │    └── 0000.txt
    ├── label_3 // Annotations on camera image 3
    │    └── 0000.txt
    Export KITTI annotation format
  3. The label file contains annotated objects with each line representing one. Please refer to the label_2 section for detailed information. Note that attributes like "Occlusion" and "Truncation" are not supported, with default values being occlusion: 0 and truncation: 0.00.

    Car 0.00 0 -3.61 1082.39 404.87 1389.54 645.27 1.70 1.92 3.84 2.06 -1.80 -2.86 -0.94 1
    Car 0.00 0 2.41 556.36 466.44 775.85 628.07 1.55 1.86 4.51 1.63 -1.38 7.60 2.56 1
    Truck 0.00 0 -3.27 1426.23 309.84 1751.97 444.15 1.43 1.55 4.14 7.07 -6.52 -5.65 -0.99 1
  4. The object labeled only in 2D, not 3D, corresponds to the "DontCare" class in the KITTI dataset. It has negative position values for all data except the 2D box.

    DontCare -1 -1 -10 892.66 529.90 1786.14 883.20 -1 -1 -1 -1000 -1000 -1000 -10


