python – Merge two DataFrame tables, aggregate data using pandas

Question:

Wrote the code:

array = pd.DataFrame(np.object, index=[], columns=[])
array = array.append({'Project': b.name, 'Summary': card.name, \
                  'Key': card.id, 'Assignee': m, 'Points': s}, \
                  ignore_index=True)
array1 = pd.read_excel('ProjectCostUCP.xlsx', 'Лист1')
result = array[['Project', 'Assignee', 'Points']].groupby(['Project', 
              'Assignee']).sum(axis=1)
print(result.columns.tolist()) 
result = result.merge(array1, on=['Project'])
result['Value'] = result.Points * result.Price
writer = pd.ExcelWriter('third.xlsx')
array.to_excel(writer, "June")
result.to_excel(writer, "June_agregation")
writer.save()

It is necessary to unite two tables – result and array1 by column name – 'Project'. But, as the line print(result.columns.tolist()) shows, there is only one column in the result table – 'Points'. Although before the groupby() operation there was also a 'Project' column. groupby() put 'Project' in some kind of MultiIndex from which I can't pull data. Please help me how to fix the code so that it works?
The end result of the code should be a table:

Project    Assignee Price   Points  Value
srnd-demo   Serhii  5,5     23      126,5

and after executing the line

result = array[['Project', 'Assignee','Points']].groupby(['Project', 
          'Assignee']).sum(axis=1) 

resulting in a table:

                     Points
Project    Assignee 
srnd-demo   Serhii   23

Points – turned out to be the name of Columns, and Project Assignee got into the MultiIndex type, which is not possible to work with (namely, pull data from it), the array1 table looks like:

Project    Price
srnd-demo  5.5

where Project Price is the name of the columns. You need to join the result and array1 tables by the name columns Project, then multiply Price by Points and place the value in the Value column. Usage

result.reset_index().merge(array1)

resulted in the creation of a table:

   Project     Assignee  Price_x  Points  Price_y
0  srnd-demo   Serhii      5.5    23.0      5.5

How will achieve, that all the same Price in the table was one?

Answer:

Use the .reset_index() method to turn all index/multi-index columns into normal columns:

In [245]: result
Out[245]:
                    Points
Project   Assignee
srnd-demo Serhii        23

In [246]: array1
Out[246]:
     Project  Price
0  srnd-demo    5.5

In [247]: result.reset_index().merge(array1)
Out[247]:
     Project Assignee  Points  Price
0  srnd-demo   Serhii      23    5.5

UPDATE:

If the DataFrame's contains columns with the same names that do not participate in the union (that is, they were not specified as the on , left_on , right_on ), then as a result these columns will have suffixes (default: '_x' – for the left DF and '_y' for right).

Example:

In [259]: result
Out[259]:
                    Points  Price
Project   Assignee
srnd-demo Serhii        23    5.5

In [260]: array1
Out[260]:
     Project  Price
0  srnd-demo    5.5

In [257]: result.reset_index().merge(array1, on='Project')
Out[257]:
     Project Assignee  Points  Price_x  Price_y
0  srnd-demo   Serhii      23      5.5      5.5

get rid of duplicate columns in one of the dfs:

In [258]: result.reset_index().merge(array1.drop(['Price'], axis=1), on='Project')
Out[258]:
     Project Assignee  Points  Price
0  srnd-demo   Serhii      23    5.5
Scroll to Top