python – BadZipFile: File name in directory ” and header ” differ

Question:

Windows + python 3.5.1

I receive a certain 1.zip archive, the archive contains one file Pricelist_Moscow.xlsm, the task is to extract the contents of the archive. I execute in IDLE:

from zipfile import ZipFile
zf=ZipFile(r'C:\1.zip')
zf.extractall()

I get the error:

 Traceback (most recent call last):   File "C:\Python\question.py",
 line 3, in <module>
 zf.extractall()   File "C:\Python\lib\zipfile.py", line 1347, in extractall
 self.extract(zipinfo, path, pwd)   File "C:\Python\lib\zipfile.py", line 1335, in extract
 return self._extract_member(member, path, pwd)   File "C:\Python\lib\zipfile.py", line 1397, in _extract_member
 with self.open(member, pwd=pwd) as source, \   File "C:\Python\lib\zipfile.py", line 1289, in open
 % (zinfo.orig_filename, fname)) zipfile.BadZipFile: File name in directory 'Åαá⌐ß½¿ßΓ_î«ß¬óá.xlsm' and header b'\xcf\xf0\xe0\xe9\xf1\xeb\xe8\xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm'
differ.

If you manually extract the file and pack it again, then everything works fine.

Same code from console

 zipfile.BadZipFile: File name in directory  
 '\xc5\u03b1\xe1\u2310\xdf\xbd\xbf\xdf 
\u0393_\xee\xab\xdf\xac\xf3\xe1.xlsm' and header 
b'\xcf\xf0\xe0\xe9\xf1\xeb\xe8\
xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm' differ.
zf.infolist()
[<ZipInfo               filename='\xc5\u03b1\xe1\u2310\xdf\xbd\xbf\xdf\u0393_\xee\xab\xdf\xac\
xf3\xe1.xlsm' compress_type=deflate external_attr=0x20    file_size=3838624 compress_size=3821155>]
c=''.join(zf.namelist())
c       '\xc5\u03b1\xe1\u2310\xdf\xbd\xbf\xdf\u0393_\xee\xab\xdf\xac\xf3\xe1.xlsm'



zf.extract(c,'C:\\')

zipfile.BadZipFile: File name in directory  
'\xc5\u03b1\xe1\u2310\xdf\xbd\xbf\xdf 
\u0393_\xee\xab\xdf\xac\xf3\xe1.xlsm' and header 
b'\xcf\xf0\xe0\xe9\xf1\xeb\xe8\
xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm' differ.

zf.extract('\xcf\xf0\xe0\xe9\xf1\xeb\xe8\xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm','C:\\')

Traceback (most recent call last):
File "C:\Python\question.py", line 3, in <module>
zf.extract('\xcf\xf0\xe0\xe9\xf1\xeb\xe8\xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm','C:\\')
      File "C:\Python\lib\zipfile.py", line 1330, in extract
member = self.getinfo(member)
  File "C:\Python\lib\zipfile.py", line 1199, in getinfo
'There is no item named %r in the archive' % name)
KeyError: "There is no item named '\xcf\xf0\xe0\xe9\xf1\xeb\xe8\xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm' in the archive"

Answer:

According to the specification , the zip format only understands cp437 and utf-8 encodings. The zipfile implementation of the zipfile module follows this specification.

zipfile.BadZipFile: File name in directory 'Åαá⌐ß½¿ßΓ_î “ß¬óá.xlsm' and header b '\ xcf \ xf0 \ xe0 \ xe9 \ xf1 \ xeb \ xe8 \ xf1 \ xf2_ \ xcc \ xee \ xf1 \ xea \ xe2 \ xe0.xlsm '

>>> 'Åαá⌐ß½¿ßΓ_î«ß¬óá.xlsm'.encode('cp437').decode('cp866')
'Прайслист_Москва.xlsm'
>>> b'\xcf\xf0\xe0\xe9\xf1\xeb\xe8\xf1\xf2_\xcc\xee\xf1\xea\xe2\xe0.xlsm'.decode('cp1251')
'Прайслист_Москва.xlsm'

that is, in one place the name was added to the archive using the cp866 encoding (OEM code page), which is found in cmd.exe on Russian Windows. Elsewhere, the name uses the cp1251 encoding (ANSI code page), which is found in the byte (* A) interfaces on Russian Windows.

To fix the problem, the program that creates the zip archive must use utf-8 encoding (like the 7z -mcu and zipfile module do – although the zipfile documentation does not mention utf-8 support). For example, the most straightforward code should work as is:

#!/usr/bin/env python3
from zipfile import ZipFile, ZIP_DEFLATED

with ZipFile('pricelist.zip', "w", ZIP_DEFLATED) as archive:
    archive.write('Прайслист_Москва.xlsm')
Scroll to Top