Adding control characters ('<', '>', ''', '"', '&') into xml data can cause the parser to miss understand the resulting data. The solution is to escape the control characters so that the parser can interpret them correctly as data, and not confuse them for markup.
The following is a list of all the built in replacements
Char | Escape String |
< | < |
> | > |
" | " |
' | ' |
& | & |
These can be used within XML attributes, elements, text and processing instructions.
It is good practice to always escape these characters when they appear in XML data, however this is not always required.
When attribute data is enclosed in double quotes " then any double quote " characters within the data must be escaped.
When attribute data is enclosed in single quotes ' then any single quote ' characters within the data must be escaped.
The ampersand & character must be escaped.
The greater than and less than characters do no have to be escaped but its good practice to do it.
Data | In XML | Notes |
He said "OK" | attributeName="He said "OK"" | The double quotes in the data must be escaped. |
He said "OK" | attributeName='He said "OK"' | The double quotes do not need escaping as they are contained within a single quoted attribute. |
He said "OK" | attributeName='He said "OK"' | However there is no harm in always escaping them. |
She said "You're right" | attributeName="She said "You're right"" | This is the minimum escaping required |
She said "You're right" | attributeName='She said "You're right"' | This is the minimum escaping required |
She said "You're right" | attributeName="She said "You're right"" | Typically all the data would be escaped though. |
Smith&Sons | attributeName="Smith&Sons" | The & must always be escaped within attribute data. |
a>b | attributeName="a>b" | The > does not have to be escaped |
a>b | attributeName="a>b" | It is good practice to escape > characters. |
a<b | attributeName="a<b" | The < character MUST be escaped |
The '<' character must be escaped within element text data so it is not confused for the opening brace of the next element.
The '&' character must always be escaped.
The other replacements (even the closing brace '>') are optional, but its good practice to always escape them.
Data | In XML | Notes |
if (age < 5) | <MyElement>if (age < 5)</MyElement> | The < char must always be escaped |
if (age > 5) | <MyElement>if (age > 5)</MyElement> | The > char does not have to be escaped |
if (age > 5) | <MyElement>if (age > 5)</MyElement> | However, it is good practice to escape > chars |
if (age > 3 && age < 8) | <MyElement>if (age > 3 && age < 8)</MyElement> | |
She said "You're right" | <MyElement>She said "You're right"</MyElement> | The ' and " chars don't need escaping within an element |
Data within a CDATA block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a CDATA block are just seen as character data.
As no escaping is possible within CDATA it is not possible to escape the terminating ]]> therefore not possible to nest CDATA blocks.
Data | In XML | Notes |
if (age < 5) | <![CDATA[if (age < 5)</MyElement>]]> | |
if (age > 3 && age < 8) | <![CDATA[if (age > 3 && age < 8))</MyElement>]]> | |
]]> | ERROR | It is not possible to escape the end sequence of the CDATA block, so the string ]]> can not be stored within it. |
Data within a comment block can not be escaped. When the XML document is parsed (Character references are not expanded), so any chars within a Comment block are just seen as character data.
As no escaping is possible within a Comment it is not possible to escape the terminating --> therefore not possible to nest Comment blocks.
The sequence -- may not appear within a comment, no provision is provided for escaping this sequence.
Data | In XML | Notes |
Some Comment | <!-- Some Comment --> | |
The chars --> end a comment | <!-- The chars --> end a comment --> | This is Invalid. The --> in the comment can not be escaped, and contains the sequence -- which is illegal in a comment. |
The chars -- are also illegal | <!-- The chars -- are also illegal --> | This is Invalid. The character sequence -- is not allowed in a comment. |
if (age > 3 && age < 8) | <!-- if (age > 3 && age < 8) --> | Valid. The data requires no escaping |
<CommentedOutElm> data </CommentedOutElm> |
<!-- <CommentedOutElm> data </CommentedOutElm> --> |
Valid. The data requires no escaping |
Character references allow the character code to be specified within the data instead of the literal character. This can be useful if you can not type the character (i.e. ©) or if the XML document encoding does not support the character directly.
The character encodings can be used interchangeable with the escape chars listed above.
Char | Escape String | Character Encoding |
< | < | < |
> | > | > |
" | " | " |
' | ' | ' |
& | & | & |