4

I get the following error, while trying to validate XML using a schema:

lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15

The issue is reproducing with lxml>= 6.0.0 and only on Linux (tested on Ubuntu 20 and 22).

lxml version 6.0.2 works well on Windows systems (10 and 11).

Below is a simplified example of my use case.

main.xml

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <title>Main XML</title>
    <elements>
        <element name="main element" foo="main foo">This text is from main.xml</element>
        <xi:include href="include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
    </elements>
</root>

include.xml

<?xml version="1.0" encoding="UTF-8"?>
<elements>
    <element name="element1" foo="foo1">Text 1: This content is included from another file.</element>
    <element name="element2" foo="foo2">Text 2: This content is included from another file.</element>
    <element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>

transform.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

    <!-- Identity transform: copy everything by default -->
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- Match only <message> with name="message2" and override foo -->
    <xsl:template match="element[@name='element2']">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:attribute name="foo">spam</xsl:attribute>
            <xsl:attribute name="name">message99</xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

schema.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
    <xs:element name="root">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="title" type="xs:string"/>
                <xs:element name="elements">
                    <xs:complexType>
                        <xs:sequence minOccurs="1" maxOccurs="unbounded">
                            <xs:element name="element" minOccurs="1" maxOccurs="unbounded">
                                <xs:complexType mixed="true">
                                    <xs:attribute name="name" type="xs:string" use="required"/>
                                    <xs:attribute name="foo" type="xs:string" use="required"/>
                                    <xs:attributeGroup ref="xml:specialAttrs"/>
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

</xs:schema>

Line 15 in schema.xsd is needed for the case when include.xml is not in the same directory as main.xml and it's referenced via a relative path.

E.g. <xi:include href="../include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>

In this case, the included elements will have an extra attribute added (xml:base): <element name="element1" foo="foo1" xml:base="../include.xml">Text 1: This content is included from another file.</element>

xmlParse.py

#!/usr/bin/env python3

import os
import lxml
from lxml import etree

print("Using lxml version {0}".format(lxml.__version__), end="\n\n")

tree = etree.parse("main.xml")
tree.xinclude()

# Apply transformations
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    xslt = etree.parse("transform.xslt")
    transform = etree.XSLT(xslt)
    result = transform(tree)
    tree._setroot(result.getroot())

print(etree.tostring(tree, pretty_print=True).decode())

schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
if schema.validate(tree): # Validate
    print("XML is valid.")
else:
    print("XML is invalid!")
    for error in schema.error_log:
        print(error.message)

Below the example output from my Ubuntu 20 machine:

bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

Traceback (most recent call last):
File "/opt/xml_parse.py", line 20, in
schema = etree.XMLSchema(etree.parse("schema.xsd")) # Load and parse the schema
File "src/lxml/xmlschema.pxi", line 90, in lxml.etree.XMLSchema.init
lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15

bogey@machine:/opt/xml_schema$ pip install lxml==5.4.0
Defaulting to user installation because normal site-packages is not writeable
Collecting lxml==5.4.0
Downloading lxml-5.4.0-cp310-cp310-manylinux_2_28_x86_64.whl (5.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 12.2 MB/s eta 0:00:00
Installing collected packages: lxml
Attempting uninstall: lxml
Found existing installation: lxml 6.0.2
Uninstalling lxml-6.0.2:
Successfully uninstalled lxml-6.0.2
Successfully installed lxml-5.4.0

bogey@machine:/opt/xml_schema$ python3 xml_parse.py
Using lxml version 5.4.0
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

XML is valid.

Output on Windows machine:

(venv310_win) PS C:\xml_schema> python .\xml_parse.py
Using lxml version 6.0.2
Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
<title>Main XML</title>
<elements>
<element name="main element" foo="main foo">This text is from main.xml</element>
<element name="element1" foo="foo1">Text 1: This content is included from another file.</element><element name="message99" foo="spam">Text 2: This content is included from another file.</element><element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>
</root>

XML is valid.

What's the deal? Any ideas would be appreciated. Thanks.

EDIT: Windows

Python : sys.version_info(major=3, minor=11, micro=8, releaselevel='final', serial=0)
etree : (6, 0, 2, 0)
libxml used : (2, 11, 9)
libxml compiled : (2, 11, 9)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)

Linux

Python : sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
etree : (6, 0, 0, 0)
libxml used : (2, 14, 4)
libxml compiled : (2, 14, 4)
libxslt used : (1, 1, 43)
libxslt compiled : (1, 1, 43)

6
  • 1
    May be there's a difference on underlying libxml2 library. Check it with import sys from lxml import etree print("%-20s: %s" % ('Python', sys.version_info)) print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION)) print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION)) print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION)) print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION)) print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION)) Commented Sep 26 at 15:19
  • libxml win 2.11.9, while on Linux I have 2.14.4. Libxslt win 1.1.39, while on linux it's 1.1.43 Commented Sep 26 at 15:31
  • 1
    I get the same error on macOS. If I remove line 15 from schema.xsd i.e. <xs:attributeGroup ref="xml:specialAttrs"/> it runs and still produces the same output as with 5.4.0 Commented Sep 26 at 15:50
  • I need that line for the case where include.xml is in a different directory and it's referenced via relative path, e.g. <xi:include href="../include.xml" Commented Sep 26 at 16:02
  • 1
    Opened a bug with lxml as well (bugs.launchpad.net/lxml/+bug/2125776), but also created this post for faster answers, in case there's some quick fix :D Commented Sep 26 at 16:04

1 Answer 1

3

The right way

libxml2 has enforced in latest versions the use of xml catalogs to resolve external resources due to security reasons. A custom catalog could be written as follows

catalog.xml uri gets schemaLocation value and the xsd file must be downloaded <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2001/xml.xsd"/>

wget "http://www.w3.org/2001/xml.xsd"

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
                      "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <system systemId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <uri name="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
</catalog>

The custom catalog.xml can be used with lxml as follows

import os
import lxml
from lxml import etree

# Path to your XML Catalog file
catalog_file = "catalog.xml"
os.environ["XML_CATALOG_FILES"] = catalog_file

print("Using lxml version {0}".format(lxml.__version__), end="\n\n")

schema_tree = etree.parse("schema.xsd")
schema = etree.XMLSchema(etree=schema_tree)

tree = etree.parse("main.xml", parser=parser)
tree.xinclude()

# Apply transformations
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    xslt = etree.parse("transform.xslt")
    transform = etree.XSLT(xslt)
    result = transform(tree)
    tree._setroot(result.getroot())

print(etree.tostring(tree, pretty_print=True).decode())

if schema.validate(tree): # Validate
    print("XML is valid.")
else:
    print("XML is invalid!")
    for error in schema.error_log:
        print(error.message)

Testing the catalog with xmllint

XML_CATALOG_FILES='catalog.xml' /home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml 
main.xml validates

Running the script

python3.12 parse-so.py 
Using lxml version 6.0.0

Applying transformation from transform.xslt
<root xmlns:xi="http://www.w3.org/2001/XInclude">
[REDACTED]

XML is valid.

Alternative: edit xsd

This answer suggests to remove schemaLocation from the xsd but that does not fix the problem. Downloading a copy of xml.xsd and referencing it in schema.xsd does the trick

wget "http://www.w3.org/2001/xml.xsd"

change schema to

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>

Note:
latest xmllint tool from libxml2 Linux package fails with the same error so it's not an lxml bug

/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml
I/O warning : failed to load "https://www.w3.org/2005/08/xml.xsd": No such file or directory
schema.xsd:3: element import: Schemas parser warning : Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'https://www.w3.org/2005/08/xml.xsd'. Skipping the import.
schema.xsd:15: element attributeGroup: Schemas parser error : Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition.
WXS schema schema.xsd failed to compile

It works when referencing a local xsd file

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>

/home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml 
main.xml validates
Sign up to request clarification or add additional context in comments.

5 Comments

It works, indeed, if I use a local xml.xsd file.
You can accept the answer if that fixes the issue :-). I still think it's an lxml bug
Yeah, I applied this solution for now. Will wait to see if that bug I opened will get any attention. Might be a bug, might also be a bugfix for this, lol: bugs.launchpad.net/lxml/+bug/1234114 (allthough I tried to explicitly set no_network=False.
I agree, this looks like an lxml bug. Problems with loading the schema for the XML namespace often arise when people try to load a non-standard version from a non-standard location, but in this case (a) you're referencing something that's defined in the standard version, and (b) you're referencing it at a location where the standard version is found. So if anyone is loading a non-standard version from a non-standard location then it's lxml itself.
Sorry, forgot to update, libxml 2.13.0 removed HTTP support: discourse.gnome.org/t/libxml2-2-13-0-released/21529 My bug: gitlab.gnome.org/GNOME/libxml2/-/issues/990

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.