I'm prepping our experimental data for trajectory analysis by partially following this guide here and I have a list of 12 AnnData objects that I read in as loom files. 6 of them come from one sequencing run whereas the other 6 come from another. I followed the aforementioned link's recommendation to generated spliced/unspliced count matrices using velocyto, which is how I got the loom files.
Anyway, that's all background information. I'm trying to merge all of these AnnData objects into one.
>>> loom_data
[AnnData object with n_obs × n_vars = 5000 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 5773 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 6807 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 5613 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 6052 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 3500 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 10510 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 9356 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 3246 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 1132 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 13595 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced', AnnData object with n_obs × n_vars = 9541 × 36601
obs: 'comp.ident'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced']
I'm having trouble understanding how to do this. All of the barcodes, which are loom_data[i].obs.index
, are unique and contain a suffix for the sample they correspond to. Ultimately, I want to bring in these layers into another AnnData object using scVelo.
The issue is calling sc.concat
. It's the genes that have overlap; none of the barcodes are supposed to match across the 12 list elements. So I want to select the vars
axis, which I think is 1
, and I want the union of all of the elements in the other axis:
test = sc.concat(loom_data, axis = 1, join = 'outer')
But when I call the line above, I get a concatenation of all of the genes with the names made unique, even though I want to consolidate their counts:
>>> test
AnnData object with n_obs × n_vars = 80125 × 439212
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
I want the genes that aren't present in all samples to just have 0 counts. How would this be possible?
I just made the var names unique first and then concatenated along the obs instead of vars.
~36k genes is about what I expected.