Change XML structure

262 views Asked by At

Hi I need make some text manipulation of this part of xml. Deleting some tags is no problem. I need before that rename car ID to CAR_ID and move inside TRIP tags.

ie: MLStarlet Toolkit ?

xmlstarlet somevariable

Original

  <car>
    <id>155028827</id>
    <trip>
      <id>1</id>
      <date>1.1.1970</date>
    </trip>
    <trip>
      <id>2</id>
      <date>1.1.1970</date>
    </trip>
  </car>

Expection result

<trip>
  <car_id>155028827</id>
  <id>1</id>
  <date>1.1.1970</date>
</trip>
<trip>
  <car_id>155028827</id>
  <id>2</id>
  <date>1.1.1970</date>
</trip>
1

There are 1 answers

3
Wintermute On

I'd say

xmlstarlet ed -i '/car/trip/descendant::node()[1]' -t elem -n car_id -u '/car/trip/car_id' -x 'ancestor::node()["car"]/id/text()' filename.xml | xmlstarlet sel -t -c '/car/trip'

This falls into two parts:

xmlstarlet ed \
   -i '/car/trip/descendant::node()[1]' -t elem -n car_id \
   -u '/car/trip/car_id' -x 'ancestor::node()["car"]/id/text()' \
   filename.xml

and

xmlstarlet sel -t -c '/car/trip'

The first is an xmlstarlet ed command, which means that XML goes in, is edited, and edited XML goes out. The edits are

   -i '/car/trip/descendant::node()[1]' -t elem -n car_id

which inserts a car_id before the first descendant of every /car/trip node, and

   -u '/car/trip/car_id' -x 'ancestor::node()["car"]/id/text()'

which sets the value of all /car/trip/car_id nodes to the text inside the id subnode of their car ancestor node. This alone produces

<?xml version="1.0"?>
<car>
  <id>155028827</id>
  <trip>
    <car_id>1550288271</car_id>
    <id>1</id>
    <date>1.1.1970</date>
  </trip>
  <trip>
    <car_id>1550288272</car_id>
    <id>2</id>
    <date>1.1.1970</date>
  </trip>
</car>

which is then piped through

xmlstarlet sel -t -c '/car/trip'

This selects (and prints) the /car/trip nodes of this XML data, producing

<trip>
    <car_id>1550288271</car_id>
    <id>1</id>
    <date>1.1.1970</date>
  </trip><trip>
    <car_id>1550288272</car_id>
    <id>2</id>
    <date>1.1.1970</date>
  </trip>

You could, if the formatting annoys you, use

xmlstarlet sel -t -c '/car/trip | /car/text()'

to preserve the whitespaces between the tags (and get more readably formatted output); with this change, the output is

  <trip>
    <car_id>1550288271</car_id>
    <id>1</id>
    <date>1.1.1970</date>
  </trip>
  <trip>
    <car_id>1550288272</car_id>
    <id>2</id>
    <date>1.1.1970</date>
  </trip>

...with two more blank lines at the top; they're the newlines before and after the /car/id node. Unfortunately, the output data is no longer valid XML, so we can't just pipe it through an XML pretty-printer (which is what I'd really like to do). Since my suspicion is that this will be embedded in further XML (so that it can be properly parsed), in the event that formatting is important, my suggestion is to embed this first and pipe the whole XML through a pretty-printer afterwards.