XML Schema Validation with Python, MSXML and PyWin32
English, Python, Python Tips, Windows, XML Add commentsPython ships with XML libraries, but none addressing XML Schema validation. Several options are available to make up for this feature. The solution demonstrated in this Python Tip is a basic implementation of XML schema validation using MSXML and PyWin32.
Requirements
In order to try the code and examples exposed in this article, it is assumed that you have Python already installed on your system and that you are familiar with this popular programming language. In addition you will need to the following software:
- MSXML 4.0 or better MSXML 6.0 (MSXML 3.0 does not support XML Schema).
- Python for Windows Extensions (PyWin32)
- Optional: py_tips_msxml_val_pywin32_1.1.zip (5KB), code source and files used to illustrate this article and available under the MIT License.
Notes:
- ActivePython is a Python distribution for Windows shipping with PyWin32, therefore you don’t need to install PyWin32 if you have installed Python from ActiveState.
- For additional resources, see section Resources at the end of this post.
Sandbox
The code in this section demonstrates a basic idea to validate an XML document with PyWin32 and MSXML. The XML files and the XML Schema files used are respectively:
- books.xsd (XML Schema)
- books.xml (Valid XML)
- books_error.xml (Non valid XML)
win32com is a module from the Python for Windows Extensions, it wraps the Windows COM API’s. The win32com.client.Dispatch() method creates COM objects from their ProgID’s. The 2 ProgID’s used are respectively: Msxml2.DOMDocument.6.0 (DOM document), and Msxml2.XMLSchemaCache.6.0 (schemas collection). You may have to use a different version if you have only MSXML 4.0 installed. After creating the DOM document COM object, set its async property to false (dom.async = 0) to use the load method in synchronous mode. Create a schema collection and add the XML schema and its namespace if any (if no namespace, an empty string is used instead). Assign the schema collection to the DOM document instance and load the document. The load method returns True if the validation is successful:
ActivePython 2.5.1.1 (ActiveState Software Inc.) based on
Python 2.5.1 (r251:54863, May 1 2007, 17:47:05) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import win32com.client
>>> dom = win32com.client.Dispatch('Msxml2.DOMDocument.6.0')
>>> dom.async = 0
>>> schemas = win32com.client.Dispatch('Msxml2.XMLSchemaCache.6.0')
>>> schemas.add('http://www.burgaud.com/XMLSchema', 'books.xsd')
>>> dom.schemas = schemas
>>> dom.load('books.xml')
True
>>>
...
If the validation fails, the load method returns False. In addition the parseError property of the DOM document contains information about the error:
...
>>> dom.load('books_error.xml')
False
>>> dom.parseError.errorCode
-1072897687
>>> dom.parseError.reason
u"'020188954' violates pattern constraint of '[0-9]{10}'.\r\nThe attribute 'isbn
' with value '020188954' failed to parse.\r\n"
...
Complete Example
A more detailed example is implemented in the file msxml_schema_val.py.
The namespace is discovered from the XML schema file, using the XPath feature from MSXML. In a real context this approach might be questionable and the namespace could be provided as a parameter to the script for example.
This script takes 2 parameters (XML file and XML Schema file). Here is an example of execution with the valid XML file:
C:\prompt>python msxml_schema_val.py books.xml books.xsd MSXML version 6: OK Namespace : http://www.burgaud.com/XMLSchema Schema : books.xsd Valid XML : books.xml
And an example with the non valid XML:
C:\prompt>python msxml_schema_val.py books_error.xml books.xsd
MSXML version 6: OK
books_error.xml: Validation Error
- Error Code : -1072897687
- Reason : '123' violates pattern constraint of '[0-9]{10}'.
The attribute 'isbn' with value '123' failed to parse.
- Character : 534
- Line : 14
- Column : 20
- Source : <Book isbn="123">
^
Recommended Books
Resources
- The Official Website of the Python Programming Language is the primary recommended resource for anything Python.
- ActivePython is a free Python distribution made available by ActiveState.
- The excellent PyWin32 by Mark Hammond is available on SourceForge.
- Resources specific to MSXML:
- MSXML SDK on MSDN
- Download MSXML 6.0 SP1 (recommended)
- Download MSXML 4.0 SP2
Download
- Companion Files Version 1.1: py_tips_msxml_val_pywin32_1.1.zip (5KB)
- Companion Files Version 1.0: py_tips_mxsml_val_pywin32_1.zip (4KB)
History
- 01/26/2008: Updated source code and companion files (version 1.1)
- 01/19/2008: Initial publication of this blog post.
Recent Comments