python – groupby pandas table grouping

Question:

I have a table in this format

User   Material   Nota
1       youtube    5
2       facebook   4
3       facebook   3
4       facebook   5
6       youtube    2
7       orkut      3
8       twitter    4

and I would like to group it so that the final table only shows sites rated by more than 1 different user

Material   User    Nota
youtube     1      5
            6      2
facebook    2      4 
            3      3
            4      5

my code and my logic

import pandas as pd

df = pd.read_csv("tabela1.csv",delimiter=",")


tabela2= pd.DataFrame({'count' : df.groupby(["Material","User","Nota"]).size()})


del tabela2['count']

tabela2.to_csv('tabela_2.csv')
tabela2 = pd.read_csv("tabela_2.csv",delimiter=",")

but then it lists the ones that were evaluated by 1 user, I would like to know if there is something in groupby that lists only the different ones to group?

What I thought of doing the two for's, I feel 1 fixed in the Material column and in this case counting how many times each material appears after deleting the materials that appear less than twice, to treat materials with less than 1 evaluation, but I believe that for a very large base this will be very costly in time

Answer:

You can use value_counts to find out which of the Material column values ​​have more than one entry, and then filter your DataFrame by those values:

import pandas as pd
import io

dados = io.StringIO("""
User,Material,Nota
1,youtube,5
2,facebook,4
3,facebook,3
4,facebook,5
6,youtube,2
7,orkut,3
8,twitter,4""")

df = pd.read_csv(dados, index_col='User')

# Criamos uma série com a contagem de 
# ocorrências pra cada valor da coluna Material
counts = df['Material'].value_counts()
print(counts)  # facebook 3, youtube 2, orkut 1...

# Filtramos para que a série só inclua os 
# materiais com mais de uma contagem
counts = counts[counts > 1]
print(counts.index)  # Index(['facebook', 'youtube'], dtype='object')

# Filtramos o dataframe original para que 
# contenha somente entradas que estejam na série
df = df[df['Material'].isin(counts.index)]
print(df)

# Se quisermos, podemos ordená-los também, como
# no exemplo dado na questão
df = df.sort_values('Material')
print(df)

Final score:

User  Material     Nota
2     facebook     4
3     facebook     3
4     facebook     5
1      youtube     5
6      youtube     2
Scroll to Top
AllEscort