python – parsing xsd file

Question:

There is a task for parsing an xsd file

xsd:

<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by Fujitsu Interstage XWand B0233 -->
<xsd:schema targetNamespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ep_ins_not_med_y_39="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" xmlns:FR_4_008_01a_08="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_08" xmlns:FR_4_008_01a_07="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_07" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:FR_2_004_01c_01="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_2_004_01c_01">
  <xsd:import namespace="http://www.xbrl.org/2003/instance" schemaLocation="http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"/>
  <xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_02_08_39" schemaLocation="../tab/FR_4_008_02_08_39/FR_4_008_02_08_39.xsd"/>
  <xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_03_08_39" schemaLocation="../tab/FR_4_008_03_08_39/FR_4_008_03_08_39.xsd"/>
  </xsd:schema>

The file was taken from the website of the Central Bank of the Russian Federation (the content had to be shortened, because all the text did not fit)

When I try to parse it using the lxml module, I can get to the attributes of the first file, but I can’t go to the attached xsd files.

How to do it?

Answer:

I'm not good at xml.etree , but using xml.etree from the standard library, you can very easily get xsd:

import xml.etree.ElementTree as ET
my_xml="""<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated by Fujitsu Interstage XWand B0233 -->
<xsd:schema targetNamespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" elementFormDefault="qualified" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ep_ins_not_med_y_39="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/ep/ep_ins_not_med_y_39" xmlns:FR_4_008_01a_08="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_08" xmlns:FR_4_008_01a_07="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_01a_07" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:FR_2_004_01c_01="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_2_004_01c_01">
  <xsd:import namespace="http://www.xbrl.org/2003/instance" schemaLocation="http://www.xbrl.org/2003/xbrl-instance-2003-12-31.xsd"/>
  <xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_02_08_39" schemaLocation="../tab/FR_4_008_02_08_39/FR_4_008_02_08_39.xsd"/>
  <xsd:import namespace="http://www.cbr.ru/xbrl/bfo/rep/2017-07-31/tab/FR_4_008_03_08_39" schemaLocation="../tab/FR_4_008_03_08_39/FR_4_008_03_08_39.xsd"/>
  </xsd:schema>"""


s = ET.fromstring(my_xml)
for k in s.iter():
   if 'schemaLocation' in k.attrib:
      print  k.attrib['schemaLocation']
Scroll to Top