How do I get all the attributes of an XML element using Go?

368 views Asked by At

I am trying to parse xml content along with all the attributes of an XML element like this

type Node struct {
  XMLName xml.Name
  Attributes []xml.Attr `xml:",attr"`
  BodyElements string `xml:",innerxml"`
  Nodes   []Node `xml:",any"`
}

var xmldata = []byte("<div><div data-id=\"images/6C7161080\" data-imagesize=\"medium\" data-alignment=\"none\"></div></div>")

func walk(nodes []Node, f func(Node) bool) {
  for _, n := range nodes {
    if f(n) {
        walk(n.Nodes, f)
    }
  }
}


func main() {

  buf := bytes.NewBuffer(xmldata)
  dec := xml.NewDecoder(buf)

  var n Node
  err := dec.Decode(&n)
  if err != nil {
    panic(err)
  }

  walk([]Node{n}, func(n Node) bool {
    if n.XMLName.Local == "p" {
        fmt.Println(string(n.BodyElements))
    } else if n.XMLName.Local == "div"{
        fmt.Println(string(n.BodyElements))
        fmt.Println(len(n.Attributes))
    }
    return true
  })
}

But the value of len(n.Attributes) is always 0. What can I do to get all the attributes in the given element. NOTE: The attribute names are not constant as sometime the element can be a "div" tag or "img" tag or something else. So I can't use the attribute name as

DataId string `xml:"data-id,attr"`
1

There are 1 answers

0
Volker On BEST ANSWER

The fundamental problem is that unmarshalling XML to your struct Node doesn't work. Your BodyElements captures the whole content of your root node and nothing is unmarshaled to your Nodes. (Btw: Adding a simple fmt.Printf would have revealed this.)

Why do you try to write your own XML unmarshalling/parsing code? You will fail. Just use the Decoder and the Token method to parse your XML by hand, one token after each other, populating your tree manually. And: If your XML actually is HTML you might want to parse it with package html.