I have a project in which i downloaded lidar 3D point cloud data from semantckitti. I want to perform Semantic Segmentation on the data using U-Net. I converted the 3d point cloud data into 2D using spherical conversion and saved the original point cloud data which was in (.bin format) into numpy arrays with dimensions as 64,1024,5 where: 64 = height , 1024 = width and 5 = xyz coordinates, Intensity and Distance from sensor of each point, in that order.
I also projected the semantic information contained in the label files of point cloud(taken from yaml file of semantickitti), on 2D image plane and saved them in .png format with each pixel having depitcing the color of its respective class.
MY QUESTION IS below:
I have as input for U-net is numpy array with dimensions (64,1024,5) label image in .pn format with dimensions (64,1024)
How can i input this data in U-Net? Can i input the numpy array with (64,1024,5) directly in U-Net? or some processing needs to take place? I have read somewhere that U-Net cannot take multichannel images as input. Also do i need to one-hot encode my ground truth label images as they donot contain any additional information at the moment.