# python – Probabilistic Considerations for Calculating Shannon Entropy in Network Traffic

## Question:

Probabilistic Considerations for Calculating Shannon Entropy in Network Traffic

I have a dump file (CAP format) of a network traffic capture taken with Debian's tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP SYN flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and compare them.

I'm using Python code:

``````import numpy as np
import collections

sample_ips = [
"131.084.001.031",
"131.084.001.031",
"131.284.001.031",
"131.284.001.031",
"131.284.001.000",
]

C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts  = np.array(C.values(),dtype=float)
prob    = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)
``````

When doing the calculation in this way, some questions arise:

1. I am considering a discrete probability distribution with an equiprobable sample space. Is this reasonable? How to justify this? I don't know how the distribution is…

2.How to validate the experiment? I'm thinking of a hypothesis test with the following null hypothesis: "The entropy value allows detecting the attack" Is it coherent? What would be a good hypothesis test for the case (the sample space is around 40 in size)