Question:
Wrote the code:
array = pd.DataFrame(np.object, index=[], columns=[])
array = array.append({'Project': b.name, 'Summary': card.name, \
'Key': card.id, 'Assignee': m, 'Points': s}, \
ignore_index=True)
array1 = pd.read_excel('ProjectCostUCP.xlsx', 'Лист1')
result = array[['Project', 'Assignee', 'Points']].groupby(['Project',
'Assignee']).sum(axis=1)
print(result.columns.tolist())
result = result.merge(array1, on=['Project'])
result['Value'] = result.Points * result.Price
writer = pd.ExcelWriter('third.xlsx')
array.to_excel(writer, "June")
result.to_excel(writer, "June_agregation")
writer.save()
It is necessary to unite two tables – result and array1 by column name – 'Project'. But, as the line print(result.columns.tolist()) shows, there is only one column in the result table – 'Points'. Although before the groupby() operation there was also a 'Project' column. groupby() put 'Project' in some kind of MultiIndex from which I can't pull data. Please help me how to fix the code so that it works?
The end result of the code should be a table:
Project Assignee Price Points Value
srnd-demo Serhii 5,5 23 126,5
and after executing the line
result = array[['Project', 'Assignee','Points']].groupby(['Project',
'Assignee']).sum(axis=1)
resulting in a table:
Points
Project Assignee
srnd-demo Serhii 23
Points – turned out to be the name of Columns, and Project Assignee got into the MultiIndex type, which is not possible to work with (namely, pull data from it), the array1 table looks like:
Project Price
srnd-demo 5.5
where Project Price is the name of the columns. You need to join the result and array1 tables by the name columns Project, then multiply Price by Points and place the value in the Value column. Usage
result.reset_index().merge(array1)
resulted in the creation of a table:
Project Assignee Price_x Points Price_y
0 srnd-demo Serhii 5.5 23.0 5.5
How will achieve, that all the same Price in the table was one?
Answer:
Use the .reset_index()
method to turn all index/multi-index columns into normal columns:
In [245]: result
Out[245]:
Points
Project Assignee
srnd-demo Serhii 23
In [246]: array1
Out[246]:
Project Price
0 srnd-demo 5.5
In [247]: result.reset_index().merge(array1)
Out[247]:
Project Assignee Points Price
0 srnd-demo Serhii 23 5.5
UPDATE:
If the DataFrame's contains columns with the same names that do not participate in the union (that is, they were not specified as the on
, left_on
, right_on
), then as a result these columns will have suffixes (default: '_x'
– for the left DF and '_y'
for right).
Example:
In [259]: result
Out[259]:
Points Price
Project Assignee
srnd-demo Serhii 23 5.5
In [260]: array1
Out[260]:
Project Price
0 srnd-demo 5.5
In [257]: result.reset_index().merge(array1, on='Project')
Out[257]:
Project Assignee Points Price_x Price_y
0 srnd-demo Serhii 23 5.5 5.5
get rid of duplicate columns in one of the dfs:
In [258]: result.reset_index().merge(array1.drop(['Price'], axis=1), on='Project')
Out[258]:
Project Assignee Points Price
0 srnd-demo Serhii 23 5.5