XmlNodeType.EndElement or XmlReader.IsEmptyElement

I was going through some learning material for the 505 and I came across an example which turned out a bit wrong as it doesn’t account for all the flavors a valid xml file could be written. After some research I realized that when reading and xml file there is a difference in how the default implementation of XmlReader (and maybe the other ones like XmlTextReader) interprets nodes.

Let’s say we have the following fragments from an xml file

First fragment:


Second fragment:

The first fragment needs two XmlReader.Read calls to read it whereas the second needs only one. Right after a first XmlReader.Read call it reads the <element> node and its type is XmlNodeType.Element. A second XmlReader.Read reads the </element> node which has a type of XmlNodeType.EndElement. This is all good as we know when we are positioned on an element’s beginning node and when we are at its end node.

What about the second fragment? Can we tell when we are positioned on the end node of an element? Not really because the whole element is really only one node. The beginning is end too.

The problem may not seem too obvious as described here but the example I was talking about in the beginning of the post was trying to read the xml file into a TreeView. The trick to know when a set of child nodes ends and it’s time to go back, up one level for the next node to be added, was to test the current xml node’s type against XmlNodeType.EndElement and if it matches, to set the parent one level up.

Case XmlNodeType.EndElement
    parentNode = parentNode.Parent

Guess what. For xml files using the short closing form of a tag, the above Case branch never got executed as we never encountered and EndElement, so the parentNode continued to nest deeper and deeper.

So, apparently, in this situation you will not get an EndElement type of node but an Element type node whose IsEmptyElement property though will be set to True. This way you can tell that when your done with this element you shouldn’t expect another EndElement node to make a decision.

In case your short terminated element contains attributes right after you read the node, save the IsEmptyElement property to a Boolean, as you may read those attributes and you will lose the value as the node changes upon a new Read call.

Even though the xml element of the first fragment is empty too, the IsEmptyElement will stay False so, not very intuitive. Plus this leads to some code that breaks the symmetry of a Select Case statement. However I can perfectly understand the decision MS made, as it makes sense to interpret the second fragment’s element as only one node.

Hope this helps someone even though the MSDN documentation describes (not in so many words) the behaviour.




Reading an XML file into a DataSet (ADO.NET)

When you import an XML file into a data set (DataSet.ReadXML) the data set will contain tables corresponding to all the element sections in the XML file which contains other elements.

Besides those though, the data set contains a bunch of relations which will allow you to query child rows for a given row in a table.
Let’s consider the following example of an XML file:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<parentelement name="PE1">
    <childelement name="CE10"/>
    <childelement name="CE11"/>
<parentelement name="PE2">
    <childelement name="CE20"/>
    <childelement name="CE21"/>

When the above file is read into a dataset:
Dim testDataSet as New DataSet()

the data set that will be created will contain two tables (DataTable):
parentelement; fields: parentelement_id and name
childelement; fields: childelement_id and name

and a relation:

The import process adds automatically a MappingType to each of the DataColumn columns.
The purpose for the mapping types the documentation says, is to instruct the DataSet or DataTable.WriteXml method how to create the xml file.

– If mapping type for a column is MappingType.Attribute the values for that column will be written in the XML as attributes, meaning exactly as in our test xml file
Of course by default, for our sample file when imported the DataColumn columns of the tables will have a mapping of MappingType.Attribute.

– If we change the mapping of each column to let’s say MappingType.Element when written back to disk the file will look like this:
<?xml version="1.0" standalone="yes"?>

If we would import this file back into the data set we will end up with the same two tables: parentelement and childelement and a parentelement_childelement relation.
Even though we have other elements: name, parentelement_id, childelement_id the ReadXML method will not create corresponding tables for (I assume) it is smart enough to consider that an element without children elements must be a value.

The DataColumns containing the two id fields: parentelement_id and childelement_id will be automatically assigned MappingType.Hidden.
The purpose of this is to avoid writing the values of these fields back to the XML file when WriteXML method is invoked. As you can see above, when we changed all the mappings of the columns to MappingType.Element (it could have been MappingType.Attribute as well), anything other than MappingType.Hidden the values of those fields are written back to the xml file which might not be the desired result.

One secondary effect of a column having a mapping of MappingType.Hidden is that when we display its DataTable in a DataGridView, that column will not be displayed.