I get the following error, while trying to validate XML using a schema:
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15
The issue is reproducing with lxml>= 6.0.0 and only on Linux (tested on Ubuntu 20 and 22).
lxml version 6.0.2 works well on Windows systems (10 and 11).
Below is a simplified example of my use case.
main.xml
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<xi:include href="include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
</elements>
</root>
include.xml
<?xml version="1.0" encoding="UTF-8"?>
<elements>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element>
<element name="element2" foo="foo2">Text 2: This content is included from another file.</element>
<element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
transform.xslt
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Identity transform: copy everything by default -->
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<!-- Match only <message> with name="message2" and override foo -->
<xsl:template match="element[@name='element2']">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:attribute name="foo">spam</xsl:attribute>
<xsl:attribute name="name">message99</xsl:attribute>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
schema.xsd
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="elements">
<xs:complexType>
<xs:sequence minOccurs="1" maxOccurs="unbounded">
<xs:element name="element" minOccurs="1" maxOccurs="unbounded">
<xs:complexType mixed="true">
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="foo" type="xs:string" use="required"/>
<xs:attributeGroup ref="xml:specialAttrs"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Line 15 in schema.xsd is needed for the case when include.xml is not in the same directory as main.xml and it's referenced via a relative path.
E.g. <xi:include href="../include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
In this case, the included elements will have an extra attribute added (xml:base):
<element name="element1" foo="foo1" xml:base="../include.xml">Text 1: This content is included from another file.</element>
xmlParse.py
#!/usr/bin/env python3
import os
import lxml
from lxml import etree
print("Using lxml version {0}".format(lxml.__version__), end="\n\n")
tree = etree.parse("main.xml")
tree.xinclude()
# Apply transformations
if os.path.isfile("transform.xslt"):
print("Applying transformation from transform.xslt")
xslt = etree.parse("transform.xslt")
transform = etree.XSLT(xslt)
result = transform(tree)
tree._setroot(result.getroot())
print(etree.tostring(tree, pretty_print=True).decode())
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
if schema.validate(tree): # Validate
print("XML is valid.")
else:
print("XML is invalid!")
for error in schema.error_log:
print(error.message)
Below the example output from my Ubuntu 20 machine:
bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>Traceback (most recent call last):
File "/opt/xml_parse.py", line 20, in
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
File "src/lxml/xmlschema.pxi", line 90, in lxml.etree.XMLSchema.init
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15bogey@machine:/opt/xml_schema$ pip install lxml==5.4.0
Defaulting to user installation because normal site-packages is not writeable
Collecting lxml==5.4.0
Downloading lxml-5.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (5.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 12.2 MB/s eta 0:00:00
Installing collected packages: lxml
Attempting uninstall: lxml
Found existing installation: lxml 6.0.2
Uninstalling lxml-6.0.2:
Successfully uninstalled lxml-6.0.2
Successfully installed lxml-5.4.0bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 5.4.0
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>XML is valid.
Output on Windows machine:
(venv310_win) PS C:\xml_schema> python .\xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>XML is valid.
What's the deal? Any ideas would be appreciated. Thanks.
EDIT: Windows
Python : sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)
etree : (6, 0, 2, 0)
libxml used : (2, 11, 9)
libxml compiled : (2, 11, 9)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)
Linux
Python : sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
etree : (6, 0, 0, 0)
libxml used : (2, 14, 4)
libxml compiled : (2, 14, 4)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)
import sys from lxml import etree print("%-20s: %s" % ('Python', sys.version_info)) print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION)) print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION)) print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION)) print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION)) print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))<xs:attributeGroup ref="xml:specialAttrs"/>it runs and still produces the same output as with 5.4.0<xi:include href="../include.xml"