Create a Dataset

📂 First Step: Establishing a Dataset for Efficient Data Management and Curation

Your first step in annotating your data is to create a dataset for uploading your data. You can easily find the "create" button for a LiDAR Fusion, Image, DICOM, Audio & Video, Text, and Generative AI dataset.

annotate data, data annotation platform, data labeling, dataset

It is important to know how BasicAI defines each data type for all model tasks so you can find our features right!

LiDAR Fusion 🌌

BasicAI fully supports all types of cases related to LiDAR-Image Fusion. These include:

  1. Object detection and segmentation for Large Point Clouds.
  2. Object detection using 3D Bounding Boxes.
  3. Detection of traffic lines using 3D Polylines.
  4. 3D Instance & Semantic Segmentation.

📘

Basic AI also supports any combinations of the above.

Additionally, we offer the following:

  1. 3D Object Tracking specifically for 3D bounding boxes.
  2. 4D BEV Annotation for reconstructing point cloud data, particularly for 4D BEV. It is a datatype between Scene and Data.In 4D BEV, a single LiDAR data annotates a cuboid in LiDAR, which can be easily projected onto all images across different timestamps, just like Scene.
  3. Camera Calibration for data with both point cloud and image, but with incorrect camera extrinsic.
  4. Camera Distortion for distorting fisheye cameras and 360-degree cameras.

Image 🌄

All detection, tracking, and segmentation tasks can be performed in an Image dataset, such as bounding box, polyline, polygon, skeleton, etc

📘

Video in Image

Video can be extract frame in image dataset and then be treated as a Scene for tracking tasks. If you are going to create Clip type annotations, please use our Audio and Video datatype

Audio and Video 📹

The BasicAI Audio and Video Dataset is designed for audio and video clipping or segmenting tasks. The main difference compared to images in video is that you cannot directly annotate the video or video image frames. Instead, you need to annotate the timeline or audio track.

Text 📄

The BasicAI Text Dataset is designed for text entity and relation annotation tasks.

Generative AI 🤖

The Generative AI Dataset is designed for LLM annotation tasks such as annotating human-model dialogues with texts or images. For now, we provide the following two types of LLM annotation:

  1. RLHF Dialogue Evaluation: RLHF stands for Reinforcement Learning from Human Feedback, which means that humans give rewards or punishment towards LLM prompts in order to make the model results more in line with human preferences. RLHF Dialogue Evaluation includes annotating, sorting or scoring LLM prompts etc.
  2. SFT Dialogue Response: SFT stands for Supervised Fine-tuning. Humans give answers to the model directly. In SFT Dialogue Response annotation, the dataset generally includes one round of human-model dialogue with both prompt and response or just prompt, and a worker needs to manually add a response as a user or a bot, in order to fine-tune a model's responses.

What’s Next

🎉 Awesome! You've decided on a dataset. Let's upload your data now!