XmlNodeType.EndElement or XmlReader.IsEmptyElement


I was going through some learning material for the 505 and I came across an example which turned out a bit wrong as it doesn’t account for all the flavors a valid xml file could be written. After some research I realized that when reading and xml file there is a difference in how the default implementation of XmlReader (and maybe the other ones like XmlTextReader) interprets nodes.

Let’s say we have the following fragments from an xml file

First fragment:
<element></element>

and

Second fragment:
<element/>

The first fragment needs two XmlReader.Read calls to read it whereas the second needs only one. Right after a first XmlReader.Read call it reads the <element> node and its type is XmlNodeType.Element. A second XmlReader.Read reads the </element> node which has a type of XmlNodeType.EndElement. This is all good as we know when we are positioned on an element’s beginning node and when we are at its end node.

What about the second fragment? Can we tell when we are positioned on the end node of an element? Not really because the whole element is really only one node. The beginning is end too.

The problem may not seem too obvious as described here but the example I was talking about in the beginning of the post was trying to read the xml file into a TreeView. The trick to know when a set of child nodes ends and it’s time to go back, up one level for the next node to be added, was to test the current xml node’s type against XmlNodeType.EndElement and if it matches, to set the parent one level up.

Case XmlNodeType.EndElement
    parentNode = parentNode.Parent

Guess what. For xml files using the short closing form of a tag, the above Case branch never got executed as we never encountered and EndElement, so the parentNode continued to nest deeper and deeper.

So, apparently, in this situation you will not get an EndElement type of node but an Element type node whose IsEmptyElement property though will be set to True. This way you can tell that when your done with this element you shouldn’t expect another EndElement node to make a decision.

In case your short terminated element contains attributes right after you read the node, save the IsEmptyElement property to a Boolean, as you may read those attributes and you will lose the value as the node changes upon a new Read call.

Even though the xml element of the first fragment is empty too, the IsEmptyElement will stay False so, not very intuitive. Plus this leads to some code that breaks the symmetry of a Select Case statement. However I can perfectly understand the decision MS made, as it makes sense to interpret the second fragment’s element as only one node.

Hope this helps someone even though the MSDN documentation describes (not in so many words) the behaviour.

 

Cheers

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s