Liquid Technologies - XML Glossary
XML / Escaping XML Data
    Escaping XML Data

    Escaping XML Data

    Adding control characters ('<', '>', ''', '"', '&') into xml data can cause the parser to miss understand the resulting data. The solution is to escape the control characters so that the parser can interpret them correctly as data, and not confuse them for markup.

    The following is a list of all the built in replacements

    Char Escape String
    < &lt;
    > &gt;
    " &quot;
    ' &apos;
    & &amp;

    These can be used within XML attributes, elements, text and processing instructions.

    It is good practice to always escape these characters when they appear in XML data, however this is not always required. 

    Element and Attribute names can NOT contain characters <>"'& escaped or otherwise

    Attribute Data

    When attribute data is enclosed in double quotes " then any double quote " characters within the data must be escaped.
    When attribute data is enclosed in single quotes ' then any single quote ' characters within the data must be escaped.
    The ampersand & character must be escaped.
    The greater than and less than characters do no have to be escaped but its good practice to do it.

    Data In XML Notes
    He said "OK" attributeName="He said &quot;OK&quot;" The double quotes in the data must be escaped.
    He said "OK" attributeName='He said "OK"' The double quotes do not need escaping as they are contained within a single quoted attribute.
    He said "OK" attributeName='He said &quot;OK&quot;' However there is no harm in always escaping them.
    She said "You're right" attributeName="She said &quot;You're right&quot;" This is the minimum escaping required
    She said "You're right" attributeName='She said "You&apos;re right"' This is the minimum escaping required
    She said "You're right" attributeName="She said &quot;You&apos;re right&quot;" Typically all the data would be escaped though.
    Smith&Sons attributeName="Smith&amp;Sons" The & must always be escaped within attribute data.
    a>b attributeName="a>b" The > does not have to be escaped
    a>b attributeName="a&gt;b" It is good practice to escape > characters.
    a<b attributeName="a&lt;b" The < character MUST be escaped


    Element Data

    The '<' character must be escaped within element text data so it is not confused for the opening brace of the next element.
    The '&' character must always be escaped.
    The other replacements (even the closing brace '>') are optional, but its good practice to always escape them.

    Data In XML Notes
    if (age < 5) <MyElement>if (age &lt; 5)</MyElement> The < char must always be escaped
    if (age > 5) <MyElement>if (age > 5)</MyElement> The > char does not have to be escaped
    if (age > 5) <MyElement>if (age &gt; 5)</MyElement> However, it is good practice to escape > chars
    if (age > 3 && age < 8) <MyElement>if (age &gt; 3 &amp;&amp; age &lt; 8)</MyElement>  
    She said "You're right" <MyElement>She said "You're right"</MyElement> The ' and " chars don't need escaping within an element

    CDATA

    Data within a CDATA block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a CDATA block are just seen as character data.

    As no escaping is possible within CDATA it is not possible to escape the terminating ]]> therefore not possible to nest CDATA blocks.

    Data In XML Notes
    if (age < 5) <![CDATA[if (age < 5)</MyElement>]]>  
    if (age > 3 && age < 8) <![CDATA[if (age > 3 && age < 8))</MyElement>]]>  
    ]]> ERROR It is not possible to escape the end sequence of the CDATA block, so the string ]]> can not be stored within it.

    Comments

    Data within a comment block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a Comment block are just seen as character data.

    As no escaping is possible within a Comment it is not possible to escape the terminating --> therefore not possible to nest Comment blocks.

    The sequence -- may not appear within a comment, no provision is provided for escaping this sequence.

    Data In XML Notes
    Some Comment <!-- Some Comment -->  
    The chars --> end a comment <!-- The chars --> end a comment --> This is Invalid. The --> in the comment can not be escaped, and contains the sequence -- which is illegal in a comment.
    The chars -- are also illegal <!-- The chars -- are also illegal --> This is Invalid. The character sequence -- is not allowed in a comment.
    if (age > 3 && age < 8) <!-- if (age > 3 && age < 8) --> Valid. The data requires no escaping
    <CommentedOutElm>
       data
    </CommentedOutElm>
    <!-- <CommentedOutElm>
       data
    </CommentedOutElm> -->
    Valid. The data requires no escaping

    Character References

    Character references allow the character code to be specified within the data instead of the literal character. This can be useful if you can not type the character (i.e. ©) or if the XML document encoding does not support the character directly.

    The character encodings can be used interchangeable with the escape chars listed above.

    Char Escape String Character Encoding
    < &lt; &#60;
    > &gt; &#62;
    " &quot; &#34;
    ' &apos; &#39;
    & &amp; &#38;

    See Also