python – Find value that repeats itself the most

Question:

I'm trying to analyze some shoe sales data, but I'm having difficulty creating a function to find the number that the customer bought the most in the previous year.

I have a table with this data:

Cód. Cliente    CPF     Nome                            Sexo        Tamanho
5879099     37513584800 LOJA                            MASCULINO   35
5879099     37513584800 LOJA                            MASCULINO   23
5879099     37513584800 LOJA                            MASCULINO   17
5879099     37513584800 LOJA                            MASCULINO   37
5879099     37513584800 LOJA                            MASCULINO   17
3353800     2613618809  DULIO JOSE DE SOUSA DAMICO      MASCULINO   35
3353800     2613618809  DULIO JOSE DE SOUSA DAMICO      MASCULINO   39
3112300     29953652805 ROSANA DA SILVA FAGUNDES        FEMININO    34
6116202     39285701884 ANA CAROLINA DE FARIAS FRANCISCO    FEMININO    31

The table is much more than this, just a few lines of example.

Well, what I need to know is what is the size that is most repeated by the client's CPF.

What number did he buy the most?

I couldn't find a way to do this if someone has a light.

Thanks,

Answer:

Yuri you could use PIVOT TABLE (pivot table) on pandas

It would be something like this:

import pandas as pd
import numpy as np

df = pd.read_excel("SEU ARQUIVO")
table = pd.pivot_table(df,index=["CPF","Tamanho"],
               values=["Tamanho"],
               aggfunc=[np.count_nonzero],fill_value=0)

I used the 'read_excel' just as an example, in your case just fill the dataframe with your data.

The 'index' parameter assembles the PivotTable columns, that is, the category columns you want to use

and in 'aggfunc' ( Aggregation function ) I'm using count

This Link has interesting content about Pivot Table that can help you more.

Scroll to Top