number of connected nodes to specific nodes in a path

119 views Asked by At

I have a cypher query (below). It works but I was wondering if there's a more elegant way to write this.

Based on a given starting node, the query tries to:

  1. Find the following pattern/motif: (inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko).

  2. Foreach the motifs/patterns found, find connected nodes with labels contigs, for the following nodes in the pattern: [inputko, ko2, ko3].

  3. A summary of the 3 nodes and their connected contigs, ie. the name property .ko of the 3 nodes and the number of connected :contig nodes in each of the (inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko) motifs that were found.

    +--------------------------------------------------------------------------+
    | KO1         | KO1count | KO2         | KO2count | KO3         | KO3count |
    +--------------------------------------------------------------------------+
    | "ko:K00001" | 102      | "ko:K14029" | 512      | "ko:K03736" | 15       |
    | "ko:K00001" | 102      | "ko:K00128" | 792      | "ko:K12972" | 7        |
    | "ko:K00001" | 102      | "ko:K00128" | 396      | "ko:K01624" | 265      |
    | "ko:K00001" | 102      | "ko:K03735" | 448      | "ko:K00138" | 33       |
    | "ko:K00001" | 102      | "ko:K14029" | 512      | "ko:K15228" | 24       |
    +--------------------------------------------------------------------------+
    

I'm puzzled for the syntax to operate on each match. From the documentation the foreach clause doesn't seem to be what I need. Any ideas guys?

The FOREACH clause is used to update data within a collection, whether components of a path, or result of aggregation.

Collections and paths are key concepts in Cypher. To use them for updating data, you can use the FOREACH construct. It allows you to do updating commands on elements in a collection — a path, or a collection created by aggregation.

START 
    inputko=node:koid('ko:\"ko:K00001\"') 
MATCH
    (inputko)--(c1:contigs)
WITH
    count(c1) as KO1count, inputko
MATCH
    (inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)
WITH
    inputko.ko as KO1,
    KO1count,
    ko2,
    ko3
MATCH
    (ko2)--(c2:contigs)
WITH
    KO1,
    KO1count,
    ko2.ko as KO2,
    count(c2) as KO2count,
    ko3
MATCH
    (ko3)--(c3:contigs)
RETURN 
    KO1,
    KO1count,
    KO2,
    KO2count,
    ko3.ko     AS KO3,
    count(c3)  AS KO3count
LIMIT
    5;

realised that i have to place distinct for in count(distinct cX) to get a accurate count. Do not know why.

1

There are 1 answers

1
Dave Bennett On BEST ANSWER

I am not sure how elegant this is but I think it does give you some notion about how you could extend your query for n ko nodes in a path and still return the data as you have laid it out below. It should also demonstrate the power of combining the with directive and collections.

// match the ko/cpd node paths starting with K00001
match p=(ko1:ko {name:'K00001' } )-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)

// remove the cpd nodes from each path and name the collection row
with collect([n in nodes(p) where labels(n)[0] = 'ko' | n]) as row

// create a range for the number of rows and number of ko nodes per row
with row
, range(0, length(row)-1, 1) as idx
, range(0, 2, 1) as idx2

// iterate over each row and node in the order it was collected
unwind idx as i
unwind idx2 as j
with i, j, row[i][j] as ko_n

// find all of the contigs nodes atttached to each ko node
match ko_n--(:contigs)

// group the ko node data together in a collection preserving the order and the count
with i, [j, ko_n.name, count(*)] as ko_set
order by i, ko_set[0]

// re-collect the ko node sets as ko rows
with i, collect(ko_set) as ko_row
order by i

//return the original paths in the ko node order with the counts
return reduce( ko_str = "", ko in ko_row | 
  case 
    when ko_str = "" then ko_str + ko[1] + ", " + ko[2]
    else ko_str + ", " + ko[1] + ", " + ko[2]
  end) as `KO-Contigs Counts`

The foreach directive in cypher is strictly for mutating data. For instance , you could use one query to collect the contigs counts per ko node.

This is a bit convoluted and you would never update the number of contigs on a ko node like this but it illustrates the use of foreach in cypher.

match (ko:ko)-->(:contigs) 
with ko,count(*) as ct
with collect(ko) as ko_nodes, collect(ct) as ko_counts
with ko_nodes, ko_counts, range(0,length(ko_nodes)-1, 1) as idx
foreach ( i in idx | 
      set (ko_nodes[i]).num_contigs = ko_counts[i]  )

A simpler way to perform the above update task on each ko node would be to do something like this...

match (ko:ko)-->(:contigs) 
with ko, count(*) as ct
set ko.num_contigs = ct

If you were to carry teh number of contigs on each ko node then you could perform a query like this to return the number of

// match all the paths starting with K00001
match p=(ko1:ko {name:'K00001' } )-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)
// build a csv line per path
return reduce( ko_str = "", ko in nodes(p) | ko_str +
    // using just the ko nodes in the path
    // exclude the cpd nodes
    case 
        when labels(ko)[0] = "ko" then ko.name + ", " + toString(ko.num_contigs) + ", "
        else ""
    end
) as `KO-Contigs Counts`