terminology – Hash, Object or Dictionary?

Question:

I've been studying python and ruby ​​and I've seen that several other languages ​​like these two make use of different terminology to determine a Json 'object'.

var objeto = { 'name' : 'Darth Vader' } .

The point is that I've seen ruby ​​called hash (Which reminds me of encryption) , objeto javascript and dicionário python.

  • Does the terminology change?
  • Why don't they cover a universal terminology since the structure is pretty much the same?

Note: I gave the example in json, but also understand how primitive type of each language in the context of the question.

Answer:

I can't answer why the terminology isn't universal (can anyone?). But I can try to explain where each term used comes from. I don't have a preferred term for this data structure, and it varies depending on what language I'm using, but I'll call it "dictionary" here.

In JavaScript (I'll use JavaScript in the answer because JSON stands for "JavaScript Object Notation"), it's called an object because the language designers wanted to implement some sort of object-oriented, and the simplest way to do that in a dynamic language like JavaScript is using a dictionary, which relates the name of the object's members (its functions and attributes) to its values. Objects in JavaScript are restricted to using strings or numbers as a key, but can have anything as a value. By the way, the terms "key" and "value" are universal, no matter the language.

Python, another dynamic language, followed a similar idea to JavaScript to implement object orientation: its objects use a dictionary to keep the relationship between the names and values ​​of their attributes, but it did it in an explicit way (which is more organized , in my opinion). Instead of every dictionary being an object in Python, there are dictionaries ( dict and notation { 'name' : 'Darth Vader' } ), and any dynamic class defined in Python will contain a special member called __dict__ , which is the dictionary that stores the dynamic members of objects of that class. So I can access a member of an object in two different ways:

class Character:
    def __init__(self):
        self.name = 'Darth Vader'

dv = Character()
print(dv.name) # Imprime "Darth Vader" na tela
print(dv.__dict__['name']) # Também imprime "Darth Vader" na tela

I've explained this about Python so that it's clear why it's called an "object" in JavaScript. But if JavaScript merges the object and dictionary concepts into one thing, in Python they're explicitly separate, and there are both non-dictionary-dependent objects (look at the __slots__ attribute if you're interested) and independent dictionaries of other objects, and you can be used explicitly:

>>> d = {'banana': 'amarela'}
>>> d['maçã'] = 'vermelha'
>>> d
{'maçã': 'vermelha', 'banana': 'amarela'}

In Python, the key is not restricted to strings and numbers, any hashable object can be a key. And then we get into the next term: hash . A hashable object is one that has a defined hash function: a function that takes the object, and returns an integer value corresponding to that object. Hashes are not necessarily tied to cryptography, as you said. MD5, SHA-256, etc. are cryptographic hash functions. They are very slow to use in dictionaries, and are designed to be difficult to reverse (given a hash value, it is difficult to figure out what value generated that hash). For dictionaries, much simpler hashes are often used, which are not necessarily difficult to reverse. It involves some bit-by-bit operations and some multiplication with some constant prime number and that's it. The important thing is to be fast, and the result is very random, with little chance that different values ​​will result in the same hashes.

Another very common name for dictionary is "hash table". Strictly speaking, hash tables are dictionaries, but not every dictionary is a hash table, although hash tables are the most common type of dictionary. Python, JavaScript, and (apparently from what you said) Ruby dictionaries are implemented as hash tables. This term makes explicit the role of the hash function in dictionary construction.

A dictionary needs to provide a quick mechanism for getting a value from a key, and in the hash table this is done as follows: everything is stored in a vector of size M > N , where N is the number of elements stored. To store a k key value, just do:

vetor[hash(k) % M] = valor

Remember that k can be a string, but hash(k) is an integer. The % operation (integer division remainder) guarantees that the value will be between 0 and M - 1 (you can add 1 if you prefer to think of vectors going from 1 to M ). The bigger M and the better (more "random") the hash function, the less chance that two of the same keys will end up in the same position in the vector. But that happens, it's called a "hash collision". When this happens, more than one value is stored in the same position (usually each element of the array is a linked list that contains all the elements with that hash). In these cases, a search for key element k requires that this list be traversed in search of the right element. Therefore, it is said that the average complexity of a hash table is O(1) (constant time, just a few arithmetic operations are enough to find an element, in the average case where there is no collision), and O(n) in the worst case of the improbable situation where all elements of the hash table fall into the same position in the array, situation where the hash table degenerates into a linked list.

Another common term for a dictionary is "map" because, like a dictionary, it conveys the idea of ​​allowing you to find something quickly. This term comes from C++, which has the structure "std::map" (not to be confused with map from Python and Haskell, which is something else), and unlike other dictionaries, it guarantees that the elements will be stored in an orderly way, and so it is generally implemented as some sort of binary tree (the most common implementation is with a red-black tree ), which in practice is not as fast a dictionary as a hash table (has O(log n) complexity), but it has the advantage that the elements are sorted by key. In 2011 the structure "std::unordered_map" was included in the C++ language, which does not guarantee the order of the elements, and therefore can be (and is) implemented as a hash table.

I hope I have clarified the origin of all these terms.

Scroll to Top