python – How to fix this encoding error in Pandas

Question:

I'm having problem when python reads xlsx with pandas. When you run des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip() and put the pt variable. There is an encoding problem because some characters are in utf-8.

import pandas as pd

create_result = open('resultado.json', 'w')
i = 0

file_name_pt = pd.ExcelFile('pt.xlsx', encoding='utf-8')
file_name_en = pd.ExcelFile('en.xlsx')

f_pt = pd.read_excel(file_name_pt, sheet_name='Sheet1')
title_pt = f_pt.columns[1:]

f_en = pd.read_excel(file_name_en, sheet_name='Sheet1')
title_en = f_en.columns[1:]

create_result.write('{\n"resultados": [\n')
while i <= 25:
    for pt,en in zip(title_pt, title_en):
        print pt
        pt = pt.encode('utf-8').strip()
        en = en.encode('utf-8').strip()
        print pt

        des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip()
        des_en = (f_en.head()[en][0]).encode('utf-8').strip()

        print des_pt     
        create_result.write('{\n"id":%s,\n"nome":"%s",\n"name":"%s",\n"descricao":"%s",\n"description":"%s",\n"combinacoes":[]},\n'%(i, pt, en, '', des_en))
        i+=1
create_result.write(']\n}')
create_result.close()
print 'Done'

The error message

Traceback (most recent call last):
  File "/Users/atila/Desktop/PyAutomate/firjan_result_generator/firjangenerator.py", line 23, in <module>
    des_pt = (f_pt.head()[pt][0]).encode('utf-8').strip()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/generic.py", line 2486, in _get_item_cache
    values = self._data.get(item)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3066, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'T\xc3\xa9cnico em Energias Renov\xc3\xa1veis'

Answer:

Would you be able to provide the format and some samples of the content (spreadsheets) you are trying to read?

With the description of your question, the only thing I can contribute is the following:

The error KeyError: 'T\xc3\xa9cnico em Energias Renov\xc3\xa1veis' happens because it is accessing a structure of key, value (key, value) and it cannot find the key, in this case the string 'T\xc3\ xa9cnico in Renewal Energies\xc3\xa1veis'.

To be able to help you more, maybe make available the content (at least the first lines) of the files you are reading.

Scroll to Top