Importing a large point cloud data file into MATLAB

1.5k views Asked by At

I am a new MATLAB user with little programming experience (I have a mechanical engineering background) so I apologise in advance if this is a simple question!

I am trying to import a large point cloud file (.pts file extension) into MATLAB for processing. I'm lead to believe that the file contains a text header and 3 columns of integer data (x, y and z coordinates) - I managed to open the first part of the file as a text file and this is the case.

I cannot import the file directly into MATLAB as it is too large (875 million points) and can only import it 9000000 rows at a time, therefore I have written the script below to import the file (and consequently save) as 9000000x3 blocks, saved as MATLAB files (or another appropriate format).

Script:

filename='pointcloud.pts';
fid = fopen(filename,'r');
frewind(fid);
header=fread(fid,8,'*char');
points=fread(fid,1,'*int32');
pointsinpass=9000000;
numofpasses=(points/pointsinpass)
counter = 1;

while counter <= numofpasses;

   clear block;

   block=zeros(pointsinpass,3);


    for p=1:pointsinpass;
      block(p,[1:3])=fread(fid, 1,'float');
    end;

    indx=counter;
    filename=sprintf('block%d',indx);
    save (filename), block;


    disp('Iteration')
    disp(counter)
    disp('complete')
    counter=counter+1;


end;
fclose(fid);

The script runs fine and cycles through 5 iterations, importing 5 blocks of the data. Then, as it attempts to import the 6th chunk I get the following error:

Subscripted assignment dimension mismatch.

Error in LiDARread_attempt5 (line 22)
          block(p,[1:3])=fread(fid, 1,'float');

I am unsure about what is causing the error, I believe it is relating to fread command size, as I have experimented with various values such as 3, which enables just one block to be imported before the dimension mismatch error occurs.

Once more I apologise if I am missing something very basic, my understanding of programming techniques is very limited only having been introduced to it a couple of months ago.

2

There are 2 answers

0
Oleg On BEST ANSWER

At some point fread() returns [] empty.

I can show how to reproduce the error:

a = zeros(2,2)
a =
     0     0
     0     0
a(2,1:2) = []

Subscripted assignment dimension mismatch. 

I suggest to use textscan() instead of fread().

0
Paul On

Matlab is a great tool but for big data problems I have found it struggles. Though it would represent a learning curve, may I suggest you look into python? I made the switch from matlab to python many years ago and have not looked back too much along the way.

Spyder is a powerful IDE http://code.google.com/p/spyderlib/ that should provide a good bridge for matlab users. Pythonxy http://code.google.com/p/pythonxy/ for windows will give you all the tools you need to be productive on that platform however last I checked it only supported 32 bit address space. If you need 64 bit support on windows, there are the fantastic packages provided by https://stackoverflow.com/users/453463/cgohlke at http://www.lfd.uci.edu/~gohlke/pythonlibs/ Of course on linux, all the necessary packages can be very easily installed. You'll need to use python2.7 in all cases for full compatibility with the requisite packages

I don't know all the particulars of your problem but using the numpy memmap data structure would probably help. It allows huge arrays to be operated on from disk without loading the entire array into main memory. It takes care of the internals for you.

Basically all you do is:

##memmap example
#notice we first use the mdoe w+ to create.  Subsequent reads 
#(and modifications can use r+)
fpr = np.memmap('MemmapOutput', dtype='float32', mode='w+', shape=(3000000,4))
fpr = numpy.random.rand(3000000,4)
del fpr #this frees the array and flushes to disk
fpr = np.memmap('MemmapOutput', dtype='float32', mode='r+', shape=(3000000,4))
fpr = numpy.random.rand(3000000,4)#reassign the values - in general you might not need to modify the array. but it can be done
columnSums = fpr.sum(axis=1) #notice you can use all the numpy functions seamlessly
del fpr #best to close the array again when done proces

Please don't take this the wrong way. I'm not trying to convince you to abandon matlab, but to consider adding another tool into your toolset.