How to display plant species biomass in a site by species matrix?

925 views Asked by At

I earlier asked "How to display two columns as binary (presence/absence) matrix?". This question received two excellent answers. I would now like to take this a step further and add a third column to the original site by species columns which reflects the biomass of each species in each plot.

Column 1 (plot) specifies code for ~ 200 plots, column 2 (species) specifies code for ~ 1200 species and Column 3 (biomass) specifies the dryweight. Each plot has > 1 species and each species can occur in > 1 plot. The total number of rows is ~ 2700.

> head(dissim)
    plot species biomass
1 a1f56r  jactom 20.2
2 a1f56r  zinunk 10.3
3 a1f56r  mikcor 0.4
4 a1f56r  rubcle 1.3
5 a1f56r  sphoos 12.4
6 a1f56r nepbis1 8.2

           plot species biomass
2707 og100m562r  selcup 4.7
2708 og100m562r  pip139 30.5
2709 og100m562r  stasum 0.1
2710 og100m562r  artani 3.4
2711 og100m562r  annunk 20.7
2712 og100m562r  rubunk 22.6

I would like to create a plot by species matrix that displays the biomass of each species in each plot (rather than a binary presence/absence matrix), something of the form:

    jactom  rubcle  chrodo  uncgla
a1f56r  1.3 0   10.3    0
a1f17r  0   22.3    0   4
a1m5r   3.2 0   3.7 9.7
a1m5r   1   0   0   20.1
a1m17r  5.4 6.9 0   1

Any advice on how to add this additional level of complexity would be very much appreciated.


There are 3 answers


The xtabs and tapply functions return a table which is a matrix:

# Using MrFlick's example
> xtabs(~a+b,dd)
a   f g h i j
  a 0 1 0 2 3
  b 0 0 2 1 0
  c 0 3 0 0 1
  d 2 2 2 1 1
  e 1 1 2 4 1

# --- the tapply solution is a bit less elegant
> dd$one=1
> with(dd, tapply(one, list(a,b), sum))
   f  g  h  i  j
a NA  1 NA  2  3
b NA NA  2  1 NA
c NA  3 NA NA  1
d  2  2  2  1  1
e  1  1  2  4  1

# If you want to make the NA's become zeros then:

> tbl <- with(dd, tapply(one, list(a,b), sum))
> tbl[] <- 0
> tbl
  f g h i j
a 0 1 0 2 3
b 0 0 2 1 0
c 0 3 0 0 1
d 2 2 2 1 1
e 1 1 2 4 1
MrFlick On

With sample data

    a=sample(letters[1:5], 30, replace=T),
    b=sample(letters[6:10], 30, replace=T)

if you know each occurrence only appears once you can do

with(dd, table(a,b))

#    b
# a   f g h i j
#   a 0 1 0 2 3
#   b 0 0 2 1 0
#   c 0 3 0 0 1
#   d 2 2 2 1 1
#   e 1 1 2 4 1

if they are potentially duplicated, and you only want to track presence/absence, you can do

with(unique(dd), table(a,b))
# or 
with(dd, (table(a,b)>0)+0)

#    b
# a   f g h i j
#   a 0 1 0 1 1
#   b 0 0 1 1 0
#   c 0 1 0 0 1
#   d 1 1 1 1 1
#   e 1 1 1 1 1
Tim On

You asked also about a solution when there are three variables. Below I provide two solutions that you asked for.

First, let's set up the data the data:

  a=sample(letters[1:5], 30, replace=T),
  b=sample(letters[6:10], 30, replace=T),
  c=sample(letters[1:3], 30, replace=T)

If you have three discrete variables and want only to count the occurrences, here you have a version of solution by @MrFlick:

by(dd, dd$c, function(x) with(x, table(a, b)))

And if you want average values of the third variable you can use this solution:

reshape::cast(dd, a ~ b, value = 'c', fun = mean)