Question:
I would like to understand how the 'hash function' in cryptography (which encrypts passwords for example) is related to the 'hash' key-value in programming (also known as a 'dictionary' in Python for example).
Answer:
The hash function, in general, is a function that takes data of arbitrary size and transforms that data into an alphanumeric value.
As you noticed, the hash function is used in different contexts within computation. Each context requires the hash function to obey (or not obey) certain types of properties.
Among these properties are determinism, interval definition, uniformity, invertibility and collision handling.
- Determinism
A hash function must always generate the same value for an input. In this way, the hash function closely matches the mathematical function model.
Some versions of python do not obey this property. This happens because python generates a seed (random) that will be used in the hashing. This kind of situation should be avoided if one wants to work with persistence (write to disk). Because, the values that were saved in an execution for a data will be different from the values generated in a new execution.
- Breaks
Some applications require the hash function to generate values within a fixed numeric range. An example of this type of application is the SHA-1 encryption algorithm that generates a value of 160 bits.
Others require the range to be dynamic. The python dictionary, which uses the value generated by the hash function as an index of an array, expands as new key-value pairs are inserted.
- Uniformity
Defined range hashing functions must ensure that each position in the range is equally likely to be generated. The reason for this is that it may be the case that two different data generate the same value ( collision ). Collisions are costly operations to handle. Depending on the case, they don't even need to be treated.
- invertibility
Cryptographic applications require it to be difficult to find data from the value generated by a hashing function.
The implementation of a hash function varies greatly depending on the problem it is supposed to solve.
A simple example of HashCode is what was implemented to generate a hash value in strings in Java (Useful for use in maps/dictionary):
public int hashCode() {
int hash = 0;
for (int i = 0; i < s.length(); i++)
hash = (hash * 31) + charAt(i);
return hash;
}
The value 31 was chosen because it is easy to implement using low-level logic (shifts) and is a prime number (for some unknown reason prime numbers have a lower number of collisions).
You can also take a look at the rabin-karp algorithm to see the application of hashing to an algorithm for finding patterns in text.
In your question you talk about the hash function used in cryptography to "encrypt passwords". However, note that encryption is a different process than hashing. When using hashing, the objective is to receive a data and generate an alpha numeric value for that data (a certain value can be generated by different data). In the case of encryption, you will modify the data to make it unreadable for those who do not know the method used when encrypting. That is, encryption is always guaranteed (for those who know the password ).