python – Help getting index value in time series

Question:

I have the following dataframe as an example:

"""
Datetime     Product1 Product2.....ProductN
01-01-2019 |    0    |   0    ...    0
01-02-2019 |    0    |   5    ...    0
01-03-2019 |    0    |   6    ...    0
01-04-2019 |    0    |   5    ...    0
01-05-2019 |    0    |   5    ...    0
01-06-2019 |    0    |   5    ...    0
01-07-2019 |    12   |   4    ...    0
01-08-2019 |    0    |   7    ...    0
01-09-2019 |    16   |   0    ...    0
01-10-2019 |    15   |   2    ...    0
01-11-2019 |    18   |   6    ...    0
01-12-2019 |    15   |   7    ...    0

I need to get the value in the column "datetime" of the first value that is different from 0 (that is, the first sale) and generate a new dataframe with this value and its difference in months versus the current date, as shown below. below (considering we are in January 2020):

Product  | fist sell   | dif
Product1 | 01-07-2019  | 5
Product2 | 01-02-2019  | 11
.        |    .        |  .
.        |    .        |  .
.        |    .        |  .
ProductN | Not launched| Not Launched

Thanks a lot

Answer:

input data

I'll create an example dataframe to work on:

import pandas as pd

data={
    '2019-01-01':[0,0,0],
    '2019-02-01':[0,5,0],
    '2019-03-01':[0,6,0],
    '2019-04-01':[0,5,0],
    '2019-05-01':[0,5,0],
    '2019-06-01':[0,5,0],
    '2019-07-01':[12,4,0],
    '2019-08-01':[0,7,0],
    '2019-09-01':[16,0,0],
    '2019-10-01':[15,2,0],
    '2019-11-01':[18,6,0],
    '2019-12-01':[15,7,0],
}
df = pd.DataFrame.from_dict(data, orient="index", 
        columns=["Product{}".format(n) for n in range(1, 4)])
df.index = pd.to_datetime(df.index)

Notice that I've made the index the date column, and cast them to type datetime . This is the current dataframe:

            Product1  Product2  Product3
2019-01-01         0         0         0
2019-02-01         0         5         0
2019-03-01         0         6         0
2019-04-01         0         5         0
2019-05-01         0         5         0
2019-06-01         0         5         0
2019-07-01        12         4         0
2019-08-01         0         7         0
2019-09-01        16         0         0
2019-10-01        15         2         0
2019-11-01        18         6         0
2019-12-01        15         7         0

Get the date of the first sale

I choose all the cells with a value other than zero, I calculate their accumulated sum and I keep the minimum of those accumulated sums, which correspond to the cell in which a numerical data appears for the first time. I get the index corresponding to that case, and that gives me the date on which that occurs. Since the result is of type datetime , which also contains the time and that doesn't interest me, I keep only the .date part:

>>> df[df!=0].cumsum().idxmin().dt.date
Product1   2019-07-01
Product2   2019-02-01
Product3          NaT
dtype: datetime64[ns]

but I can convert that to a DataFrame, and rename the column to be "first sell":

result = pd.DataFrame(df[df!=0].cumsum().idxmin().dt.date).rename({0: "first sell"}, axis=1)

Get the number of months

To this dataframe I add the column with the number of months elapsed. I obtain this number by dividing the number of days elapsed by 30, which comes from a subtraction between the current date and the one I have in the "first sell" column. Finally I replace the NaT ("not a date") with the string "Not Launched":

import datetime

result["dif"] = (datetime.date.today() - result["first sell"]).dt.days//30
result = result.fillna("Not Launched")

This is the result:

            first sell           dif
Product1    2019-07-01           6
Product2    2019-02-01           11
Product3  Not Launched  Not Launched
Scroll to Top