Question:
There is a task for parsing an xsd file
xsd:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by Fujitsu Interstage XWand B0233 -->
<xsd:schema targetNamespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ep_ins_not_med_y_39="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" xmlns:FR_4_008_01a_08="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_08" xmlns:FR_4_008_01a_07="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_07" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:FR_2_004_01c_01="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_2_004_01c_01">
<xsd:import namespace="http://www.xbrl.org/2003/instance" schemaLocation="http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"/>
<xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_02_08_39" schemaLocation="../tab/FR_4_008_02_08_39/FR_4_008_02_08_39.xsd"/>
<xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_03_08_39" schemaLocation="../tab/FR_4_008_03_08_39/FR_4_008_03_08_39.xsd"/>
</xsd:schema>
The file was taken from the website of the Central Bank of the Russian Federation (the content had to be shortened, because all the text did not fit)
When I try to parse it using the lxml module, I can get to the attributes of the first file, but I can’t go to the attached xsd files.
How to do it?
Answer:
I'm not good at xml.etree
, but using xml.etree
from the standard library, you can very easily get xsd:
import xml.etree.ElementTree as ET
my_xml="""<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by Fujitsu Interstage XWand B0233 -->
<xsd:schema targetNamespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ep_ins_not_med_y_39="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" xmlns:FR_4_008_01a_08="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_08" xmlns:FR_4_008_01a_07="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_07" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:FR_2_004_01c_01="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_2_004_01c_01">
<xsd:import namespace="http://www.xbrl.org/2003/instance" schemaLocation="http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"/>
<xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_02_08_39" schemaLocation="../tab/FR_4_008_02_08_39/FR_4_008_02_08_39.xsd"/>
<xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_03_08_39" schemaLocation="../tab/FR_4_008_03_08_39/FR_4_008_03_08_39.xsd"/>
</xsd:schema>"""
s = ET.fromstring(my_xml)
for k in s.iter():
if 'schemaLocation' in k.attrib:
print k.attrib['schemaLocation']