XQuery - Doing math on elements within a sequence and aggregating results

406 views Asked by At

I'm trying to execute a XQuery sum function in a multiplication of two XML elements, but it has been difficult to avoid the iteration in a sequence of elements. For example, consider this case:

sample data:

<Orders>
     <Order>
        <OrderKey>1</OrderKey>
        <LineItem>
          <LineNumber>1</LineNumber>
          <Quantity>41</Quantity>
          <ExtendedPrice>70848.0000</ExtendedPrice>
          <Discount>0.0913</Discount>
          <Tax>0.0663</Tax>
          <ReturnFlag>A</ReturnFlag>
          <LineStatus>F</LineStatus>
          <ShipDate>1994-09-03</ShipDate>
        </LineItem>
        <LineItem>
          <LineNumber>2</LineNumber>
          <Quantity>44</Quantity>
          <ExtendedPrice>64372.0000</ExtendedPrice>
          <Discount>0.0674</Discount>
          <Tax>0.0145</Tax>
          <ReturnFlag>A</ReturnFlag>
          <LineStatus>F</LineStatus>
          <ShipDate>1994-08-18</ShipDate>
        </LineItem>
        <LineItem>
          <LineNumber>3</LineNumber>
          <Quantity>41</Quantity>
          <ExtendedPrice>64247.0000</ExtendedPrice>
          <Discount>0.0396</Discount>
          <Tax>0.0466</Tax>
          <ReturnFlag>R</ReturnFlag>
          <LineStatus>F</LineStatus>
          <ShipDate>1994-10-21</ShipDate>
        </LineItem>
      </Order>
      . . . 
</Orders>

XQuery:

<results> {
   for $lineitem in collection("tpch")/Orders/Order/LineItem
   let $returnflag := $lineitem/ReturnFlag 
   let $linestatus := $lineitem/LineStatus
   where $lineitem/ShipDate <= "1998-08-31"
   group by $returnflag, $linestatus
   order by $returnflag, $linestatus
   return
   <record>
      <l_returnflag>{$returnflag}</l_returnflag>
      <l_linestatus>{$linestatus}</l_linestatus>
      <sum_qty>{sum($lineitem/Quantity)}</sum_qty>
      <sum_base_price>{sum($lineitem/ExtendedPrice)}</sum_base_price>
      <sum_disc_price>{sum($lineitem/ExtendedPrice*(1-$lineitem/Discount))}</sum_disc_price>
      <sum_charge>{sum($lineitem/ExtendedPrice*(1-$lineitem/Discount)*(1+$lineitem/Tax))}</sum_charge>
      <avg_qt>{avg($lineitem/Quantity)}</avg_qt>
      <avg_price>{avg($lineitem/ExtendedPrice)}</avg_price>
      <avg_disc>{avg($lineitem/Discount)}</avg_disc>
      <count_order>{count($lineitem)}</count_order>
   </record>
} </results>

BaseX output:

[XPTY0004] Item expected, sequence found: (element ExtendedPrice {...}, ......

When I removed "sum_disc_price ..." and "sum_charge ..." result lines, the query runs normally.

My question: Considering this dataset model, how can I to write this query without change its semantics?

1

There are 1 answers

9
Charles Duffy On BEST ANSWER

The error is occurring because you're using operations (like multiplication) that take exactly two arguments, and passing a sequence on one side or both. To illustrate the meaning of the error -- you get the exact same thing running:

(1,2,3) * 2

Since your goal is to multiply values together within each individual LineItem inside the $lineitem sequence (which is a sequence due to the group by operator) before aggregating the results, you could use fn:for-each-pair. Implemented with an anonymous function, one might do the following:

      <sum_disc_price>{sum(fn:for-each-pair(
        $lineitem/ExtendedPrice, $lineitem/Discount,
         function ($price, $discount) as xs:double {
           xs:double($price) * (1 - xs:double($discount))
         }(?,?)))}</sum_disc_price>