## Question:

Probabilistic Considerations for Calculating Shannon Entropy in Network Traffic

I have a dump file (CAP format) of a network traffic capture taken with Debian's tcpdump. Until a certain time, it is attack-free traffic. Then a series of TCP SYN flooding attacks begin. My goal is to calculate the entropy of each of the traffic moments (with and without attacks) and compare them.

I'm using Python code:

```
import numpy as np
import collections
sample_ips = [
"131.084.001.031",
"131.084.001.031",
"131.284.001.031",
"131.284.001.031",
"131.284.001.000",
]
C = collections.Counter(sample_ips)
counts = np.array(list(C.values()),dtype=float)
#counts = np.array(C.values(),dtype=float)
prob = counts/counts.sum()
shannon_entropy = (-prob*np.log2(prob)).sum()
print (shannon_entropy)
```

When doing the calculation in this way, some questions arise:

- I am considering a discrete probability distribution with an equiprobable sample space. Is this reasonable? How to justify this? I don't know how the distribution is…

2.How to validate the experiment? I'm thinking of a hypothesis test with the following null hypothesis: "The entropy value allows detecting the attack" Is it coherent? What would be a good hypothesis test for the case (the sample space is around 40 in size)

## Answer:

1) If you can reach the same conclusion for the probability distributions in samples with different time intervals on different days your answer is yes.

2) an experiment must be designed/thought about how it will be done, then it must be described on paper in each step of it, without skipping any step, it must be possible to be re-done by someone else who doubts the data obtained. After writing it must be done, over and over again with different data, and everyone must reach the same conclusion.

You can read more about Scientific Methodology , it will help you.