I have implemented the Self-Organizing Map(SOM) algorithm in MATLAB. Suppose each of the data points are represented in 2-dimensional space. The problem is that I want to visualize the movement of each of the data points in the training phase i.e. I want to see how the points are moving and eventually forming clusters as the algorithm is in progress say at every fix duration. I believe that this can be done through Simulation in MATLAB,but I don't know how to incorporate my MATLAB code for visualization?

1

There are 1 answers

0
Werner On

I developed a code example to visualize clustering data with multiple dimensions using all possible data projection in 2-D. It may not be the best idea for visualization (there are techniques developed for this, as SOM itself may be used for this need), specially for a higher dimension numbers, but when the number of possible projections (n-1)! is not that high it is a quite good visualizer.


Cluster Algorithm 

Since I needed access to the code so that I could save the cluster means and cluster labels for each iteration, I used a fast kmeans algorithm available at FEX by Mo Chen, but I had to adapt it so I could have this access. The adapted code is the following:

function [label,m] = litekmeans(X, k)
% Perform k-means clustering.
%   X: d x n data matrix
%   k: number of seeds
% Written by Michael Chen ([email protected]).
n = size(X,2);
last = 0;
iter = 1;
label{iter} = ceil(k*rand(1,n));  % random initialization
checkLabel = label{iter};
m = {};
while any(checkLabel ~= last)
    [u,~,checkLabel] = unique(checkLabel);   % remove empty clusters
    k = length(u);
    E = sparse(1:n,checkLabel,1,n,k,n);  % transform label into indicator matrix
    curM = X*(E*spdiags(1./sum(E,1)',0,k,k));    % compute m of each cluster
    m{iter} = curM;
    last = checkLabel';
    [~,checkLabel] = max(bsxfun(@minus,curM'*X,dot(curM,curM,1)'/2),[],1); % assign samples to the nearest centers
    iter = iter + 1;
    label{iter} = checkLabel;
end
% Get last clusters centers
m{iter} = curM;
% If to remove empty clusters:
%for k=1:iter
%  [~,~,label{k}] = unique(label{k});
%end

Gif Creation

I also used @Amro's Matlab video tutorial for the gif creation.

Distinguishable Colors

I used this great FEX by Tim Holy for making the cluster colors easier to distinguish.

Resulting code

My resulting code is as follows. I had some issues because the number of clusters would change for each iteration which would cause scatter plot update to delete all cluster centers without giving any errors. Since I didn't noticed that, I was trying to workaround the scatter function with any obscure method that I could find the web (btw, I found a really nice scatter plot alternative here), but fortunately I got what was happening going back to this today. Here is the code I did for it, you may feel free to use it, adapt it, but please keep my reference if you use it.

function varargout=kmeans_test(data,nClusters,plotOpts,dimLabels,...
  bigXDim,bigYDim,gifName)
%
% [label,m,figH,handles]=kmeans_test(data,nClusters,plotOpts,...
%   dimLabels,bigXDim,bigYDim,gifName)
% Demonstrate kmeans algorithm iterative progress. Inputs are:
%
% -> data (rand(5,100)): the data to use.
%
% -> nClusters (7): number of clusters to use.
%
% -> plotOpts: struct holding the following fields:
%
%   o leftBase: the percentage distance from the left
%
%   o rightBase: the percentage distance from the right
%
%   o bottomBase: the percentage distance from the bottom
%
%   o topBase: the percentage distance from the top
%
%   o FontSize: FontSize for axes labels.
%
%   o widthUsableArea: Total width occupied by axes
%
%   o heigthUsableArea: Total heigth occupied by axes
%
% -> bigXDim (1): the big subplot x dimension
%
% -> bigYDim (2): the big subplot y dimension
%
% -> dimLabels: If you want to specify dimensions labels
%
% -> gifName: gif file name to save
%
% Outputs are:
% 
% -> label: Sample cluster center number for each iteration
%
% -> m: cluster center mean for each iteration
%
% -> figH: figure handle
%
% -> handles: axes handles
%

%
% - Creation Date: Fri, 13 Sep 2013 
% - Last Modified: Mon, 16 Sep 2013 
% - Author(s): 
%   - W.S.Freund <wsfreund_at_gmail_dot_com> 

%
% TODO List (?):
%
%  - Use input parser 
%  - Adapt it to be able to cluster any algorithm function.
%  - Use arrows indicating cluster centers movement before moving them.
%  - Drag and drop small axes to big axes.
%

% Pre-start
if nargin < 7
  gifName = 'kmeansClusterization.gif';
  if nargin < 6
    bigYDim = 2;
    if nargin < 5
      bigXDim = 1;
      if nargin < 4
        nDim = size(data,1);
        maxDigits = numel(num2str(nDim));
        dimLabels = mat2cell(sprintf(['Dim %0' num2str(maxDigits) 'd'],...
          1:nDim),1,zeros(1,nDim)+4+maxDigits);
        if nargin < 3
          plotOpts = struct('leftBase',.05,'rightBase',.02,...
            'bottomBase',.05,'topBase',.02,'FontSize',10,...
            'widthUsableArea',.87,'heigthUsableArea',.87);
          if nargin < 2
            nClusters = 7;
            if nargin < 1
              center1 = [1; 0; 0; 0; 0];
              center2 = [0; 1; 0; 0; 0];
              center3 = [0; 0; 1; 0; 0];
              center4 = [0; 0; 0; 1; 0];
              center5 = [0; 0; 0; 0; 1];
              center6 = [0; 0; 0; 0; 1.5];
              center7 = [0; 0; 0; 1.5; 1];
              data = [...
                      bsxfun(@plus,center1,.5*rand(5,20)) ...
                      bsxfun(@plus,center2,.5*rand(5,20)) ...
                      bsxfun(@plus,center3,.5*rand(5,20)) ...
                      bsxfun(@plus,center4,.5*rand(5,20)) ...
                      bsxfun(@plus,center5,.5*rand(5,20)) ...
                      bsxfun(@plus,center6,.2*rand(5,20)) ...
                      bsxfun(@plus,center7,.2*rand(5,20)) ...
                     ];
            end
          end
        end
      end
    end
  end
end

% NOTE of advice: It seems that Matlab does not test while on
% refreshdata if the dimension of the inputs are equivalent for the
% XData, YData and CData while using scatter. Because of this I wasted
% a lot of time trying to debug what was the problem, trying many
% workaround since my cluster centers would disappear for no reason.

% Draw axes:
nDim = size(data,1);

figH = figure;
set(figH,'Units', 'normalized', 'Position',...
  [0, 0, 1, 1],'Color','w','Name',...
  'k-means example','NumberTitle','Off',...
  'MenuBar','none','Toolbar','figure',...
  'Renderer','zbuffer');

% Create dintinguishable colors matrix:
colorMatrix = distinguishable_colors(nClusters);

% Create axes, deploy them on handles matrix more or less how they
% will be positioned:
[handles,horSpace,vertSpace] = ...
  createAxesGrid(5,5,plotOpts,dimLabels);

% Add main axes
bigSubSize = ceil(nDim/2);
bigSubVec(bigSubSize^2) = 0;
for k = 0:nDim-bigSubSize
  bigSubVec(k*bigSubSize+1:(k+1)*bigSubSize) = ...
    ... %(nDim-bigSubSize+k)*nDim+1:(nDim-bigSubSize+k)*nDim+(nDim-bigSubSize+1);
    bigSubSize+nDim*k:nDim*(k+1);
end

handles(bigSubSize,bigSubSize) = subplot(nDim,nDim,bigSubVec,...
  'FontSize',plotOpts.FontSize,'box','on'); 
bigSubplotH = handles(bigSubSize,bigSubSize);
horSpace(bigSubSize,bigSubSize) = bigSubSize;
vertSpace(bigSubSize,bigSubSize) = bigSubSize;
set(bigSubplotH,'NextPlot','add',...
  'FontSize',plotOpts.FontSize,'box','on',...
  'XAxisLocation','top','YAxisLocation','right');

% Squeeze axes through space to optimize space usage and improve
% visualization capability:
[leftPos,botPos,subplotWidth,subplotHeight]=setCustomPlotArea(...
  handles,plotOpts,horSpace,vertSpace);

pColorAxes = axes('Position',[leftPos(end) botPos(end) ...
  subplotWidth subplotHeight],'Parent',figH);
pcolor([1:nClusters+1;1:nClusters+1])
% image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
% check if the upcoming buggy behaviour would be fixed. I was not
% lucky, though...
colormap(pColorAxes,colorMatrix);
% Change XTick positions to its center:
set(pColorAxes,'XTick',.5:1:nClusters+.5);
set(pColorAxes,'YTick',[]);
% Change its label to cluster number:
set(pColorAxes,'XTickLabel',[nClusters 1:nClusters-1]); % FIXME At
% least on my matlab I have to use this buggy way to set XTickLabel.
% Am I doing something wrong? Since I dunno why this is caused, I just
% change the code so that it looks the way it should look, but this is
% quite strange...
xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);

% Now iterate throw data and get cluster information:
[label,m]=litekmeans(data,nClusters);

nIters = numel(m)-1;

scatterColors = colorMatrix(label{1},:);

annH = annotation('textbox',[leftPos(1),botPos(1) subplotWidth ...
  subplotHeight],'String',sprintf('Start Conditions'),'EdgeColor',...
  'none','FontSize',18);

% Creates dimData_%d variables for first iteration:
for curDim=1:nDim
  curDimVarName = genvarname(sprintf('dimData_%d',curDim));
  eval([curDimVarName,'= m{1}(curDim,:);']);
end

%   clusterColors will hold the colors for the total number of clusters
% on each iteration:
clusterColors = colorMatrix;

% Draw cluster information for first iteration:
for curColumn=1:nDim
  for curLine=curColumn+1:nDim
    % Big subplot data:
    if curColumn == bigXDim && curLine == bigYDim
      curAxes = handles(bigSubSize,bigSubSize);
      curScatter = scatter(curAxes,data(curColumn,:),...
        data(curLine,:),16,'filled');
      set(curScatter,'CDataSource','scatterColors');
      % Draw cluster centers 
      curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
      curLineVarName = genvarname(sprintf('dimData_%d',curLine));
      eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... 
        curLineVarName ',100,colorMatrix,''^'',''filled'');']);
      set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
        curLineVarName,'CDataSource','clusterColors')
    end
    % Small subplots data:
    curAxes = handles(curLine,curColumn);
    % Draw data:
    curScatter = scatter(curAxes,data(curColumn,:),...
      data(curLine,:),16,'filled');
    set(curScatter,'CDataSource','scatterColors');
    % Draw cluster centers 
    curColumnVarName = genvarname(sprintf('dimData_%d',curColumn));
    curLineVarName = genvarname(sprintf('dimData_%d',curLine));
    eval(['curScatter=scatter(curAxes,' curColumnVarName ',' ... 
      curLineVarName ',100,colorMatrix,''^'',''filled'');']);
    set(curScatter,'XDataSource',curColumnVarName,'YDataSource',...
      curLineVarName,'CDataSource','clusterColors');
    if curLine==nDim
      xlabel(curAxes,dimLabels{curColumn});
      set(curAxes,'XTick',xlim(curAxes));
    end
    if curColumn==1
      ylabel(curAxes,dimLabels{curLine});
      set(curAxes,'YTick',ylim(curAxes));
    end 
  end
end

refreshdata(figH,'caller');

% Preallocate gif frame. From Amro's tutorial here:
% https://stackoverflow.com/a/11054155/1162884
f = getframe(figH);
[f,map] = rgb2ind(f.cdata, 256, 'nodither');
mov = repmat(f, [1 1 1 nIters+4]);

% Add one frame at start conditions:
curFrame = 1;
% Add three frames without movement at start conditions
f = getframe(figH);
mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');

for curIter = 1:nIters
  curFrame = curFrame+1;
  % Change label text
  set(annH,'String',sprintf('Iteration %d',curIter));
  % Update cluster point colors
  scatterColors = colorMatrix(label{curIter+1},:);
  % Update cluster centers:
  for curDim=1:nDim
    curDimVarName = genvarname(sprintf('dimData_%d',curDim));
    eval([curDimVarName,'= m{curIter+1}(curDim,:);']);
  end
  % Update cluster colors:
  nClusterIter = size(m{curIter+1},2);
  clusterColors = colorMatrix(1:nClusterIter,:);
  % Update graphics:
  refreshdata(figH,'caller');
  % Update cluster colors:
  if nClusterIter~=size(m{curIter},2) % If number of cluster
    % of current iteration differs from previous iteration (or start
    % conditions in case we are at first iteration) we redraw colors: 
    pcolor([1:nClusterIter+1;1:nClusterIter+1])
    % image(reshape(colorMatrix,[1 size(colorMatrix)])); % Used image to
    % check if the upcomming buggy behaviour would be fixed. I was not
    % lucky, though...
    colormap(pColorAxes,clusterColors);
    % Change XTick positions to its center:
    set(pColorAxes,'XTick',.5:1:nClusterIter+.5);
    set(pColorAxes,'YTick',[]);
    % Change its label to cluster number:
    set(pColorAxes,'XTickLabel',[nClusterIter 1:nClusterIter-1]); 
    xlabel(pColorAxes,'Clusters Colors','FontSize',plotOpts.FontSize);
  end
  f = getframe(figH);
  mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end

set(annH,'String','Convergence Conditions');

for curFrame = nIters+1:nIters+3
  % Add three frames without movement at start conditions
  f = getframe(figH);
  mov(:,:,1,curFrame) = rgb2ind(f.cdata, map, 'nodither');
end

imwrite(mov, map, gifName, 'DelayTime',.5, 'LoopCount',inf)

varargout = cell(1,nargout);

if nargout > 0
  varargout{1} = label;
  if nargout > 1
    varargout{2} = m;
    if nargout > 2
      varargout{3} = figH;
      if nargout > 3
        varargout{4} = handles;
      end
    end
  end
end

end


function [leftPos,botPos,subplotWidth,subplotHeight] = ...
  setCustomPlotArea(handles,plotOpts,horSpace,vertSpace)
%
% -> handles: axes handles
%
% -> plotOpts: struct holding the following fields:
%
%   o leftBase: the percentage distance from the left
%
%   o rightBase: the percentage distance from the right
%
%   o bottomBase: the percentage distance from the bottom
%
%   o topBase: the percentage distance from the top
%
%   o widthUsableArea: Total width occupied by axes
%
%   o heigthUsableArea: Total heigth occupied by axes
%
% -> horSpace: the axes units size (integers only) that current axes
% should occupy in the horizontal (considering that other occupied
% axes handles are empty)
%
% -> vertSpace: the axes units size (integers only) that current axes
% should occupy in the vertical (considering that other occupied
% axes handles are empty)
%

nHorSubPlot =  size(handles,1);
nVertSubPlot = size(handles,2);

if nargin < 4
  horSpace(nHorSubPlot,nVertSubPlot) = 0;
  horSpace = horSpace+1;
  if nargin < 3
    vertSpace(nHorSubPlot,nVertSubPlot) = 0;
    vertSpace = vertSpace+1;
  end
end

subplotWidth = plotOpts.widthUsableArea/nHorSubPlot;
subplotHeight = plotOpts.heigthUsableArea/nVertSubPlot;

totalWidth = (1-plotOpts.rightBase) - plotOpts.leftBase;
totalHeight = (1-plotOpts.topBase) - plotOpts.bottomBase;

gapHeigthSpace = (totalHeight - ...
  plotOpts.heigthUsableArea)/(nVertSubPlot);
gapWidthSpace = (totalWidth - ...
  plotOpts.widthUsableArea)/(nHorSubPlot);

botPos(nVertSubPlot) = plotOpts.bottomBase + gapWidthSpace/2;
leftPos(1) = plotOpts.leftBase + gapHeigthSpace/2;

botPos(nVertSubPlot-1:-1:1) = botPos(nVertSubPlot) + (subplotHeight +...
  gapHeigthSpace)*(1:nVertSubPlot-1);
leftPos(2:nHorSubPlot) = leftPos(1) + (subplotWidth +...
  gapWidthSpace)*(1:nHorSubPlot-1);

for curLine=1:nHorSubPlot
  for curColumn=1:nVertSubPlot
    if handles(curLine,curColumn)
      set(handles(curLine,curColumn),'Position',[leftPos(curColumn)...
        botPos(curLine) horSpace(curLine,curColumn)*subplotWidth ...             
        vertSpace(curLine,curColumn)*subplotHeight]);                     
    end
  end                                                         
end                                                           

end


function [handles,horSpace,vertSpace] = ...
  createAxesGrid(nLines,nColumns,plotOpts,dimLabels)

handles = zeros(nLines,nColumns);

% Those hold the axes size units:
horSpace(nLines,nColumns) = 0;
vertSpace(nLines,nColumns) = 0;

for curColumn=1:nColumns
  for curLine=curColumn+1:nLines
    handles(curLine,curColumn) = subplot(nLines,...
      nColumns,curColumn+(curLine-1)*nColumns);
    horSpace(curLine,curColumn) = 1;
    vertSpace(curLine,curColumn) = 1;
    curAxes = handles(curLine,curColumn);
    if feature('UseHG2')
      colormap(handle(curAxes),colorMatrix);
    end
    set(curAxes,'NextPlot','add',...
      'FontSize',plotOpts.FontSize,'box','on'); 
    if curLine==nLines
      xlabel(curAxes,dimLabels{curColumn});
    else
      set(curAxes,'XTick',[]);
    end
    if curColumn==1
      ylabel(curAxes,dimLabels{curLine});
    else
      set(curAxes,'YTick',[]);
    end
  end
end
end

Example

Here is an example using 5 dimensions, using the code:

center1 = [1; 0; 0; 0; 0];
center2 = [0; 1; 0; 0; 0];
center3 = [0; 0; 1; 0; 0];
center4 = [0; 0; 0; 1; 0];
center5 = [0; 0; 0; 0; 1];
center6 = [0; 0; 0; 0; 1.5];
center7 = [0; 0; 0; 1.5; 1];
data = [...
        bsxfun(@plus,center1,.5*rand(5,20)) ...
        bsxfun(@plus,center2,.5*rand(5,20)) ...
        bsxfun(@plus,center3,.5*rand(5,20)) ...
        bsxfun(@plus,center4,.5*rand(5,20)) ...
        bsxfun(@plus,center5,.5*rand(5,20)) ...
        bsxfun(@plus,center6,.2*rand(5,20)) ...
        bsxfun(@plus,center7,.2*rand(5,20)) ...
       ];
[label,m,figH,handles]=kmeans_test(data,20);

enter image description here