scala – How can you program without variables? Functionality question)

Question:

While I am doing the repetition of Python and C, I still don’t get my hands on Haskell. And the question has not left my head for a long time) Maybe somehow you can briefly answer it?

Answer:

Cycles

To implement looping algorithms, as you quite rightly noticed, recursion is used in functional programming. For instance:

map f [] = []
map f (x:xs) = f x : map f xs

The map function walks through the entire list and transforms each element using the f function. Pretty simple.

In practice, however, recursion is rarely used directly. Typically, functional programs express cyclic operations through pre-existing library primitives such as the map function above. Something like this:

numbers = [1,2,3,4,5]
strings = show <$> numbers

Here, the <$> operator is an alias for the map function ( standard in Haskell ), so the expression show <$> numbers reads "apply the show function to each item in the numbers list".

For more complex cases, there are more general analogs such as fold , for , mapM , etc. The simplest ones are implemented using recursion, and the more complex ones are built on the simplest ones.

Mutation

A "mutation" in functional programming (and in programming in general) is a change in the contents of a memory cell. In the Haskell language, mutation is in principle possible (see below), but in practice they try to avoid it, resorting to it only in the most extreme cases. Instead of directly modifying data, it is common in functional programming to create new structures from old ones. For instance:

replaceFirst (x:xs) newX = newX : xs

The replaceFirst function "replaces" the first element of the list with a new one:

xs = [1,2,3,4]
ys = replaceFirst xs 42
-- Теперь  ys = [42,2,3,4]

(the attentive reader will notice that replaceFirst not apply to an empty list, but this is not relevant to the current discussion)

Logically, the operation replaceFirst xs 42 can be understood as "replace the first element of the list xs with 42 ", but in practice this is not quite the case. In fact, the old xs list is alive and well, and the ys list is a brand new list. Now we have a choice: we can throw out the xs list as unnecessary so that the garbage collector can remove it later, or we can leave it "in reserve" for the future.

This approach opens up very wide opportunities. For example, it doesn't cost us anything to keep the history of changes – either just for accounting purposes, or with the goal of going back in time, or something else. A more interesting feature is program parallelization. If we are sure that the old data will never change, then there is no point in fencing semaphores and other synchronization – you can just run the programs in parallel, and that's it! – the compiler guarantees that they will never interfere with each other by modifying common data. Data synchronization is a cornerstone of parallel programming and a major source of bugs. If in doubt, search for " defensive copy ".

Q: Do you mean to say that with every change, a whole new data structure must be allocated ?! You can't get enough of any kind of memory!

There is no need. To prevent this from happening, functional programming uses special data structures called " persistent data structures " in English. These are data structures in which each subsequent update builds on the previous version.

The simplest example is a singly linked list. Imagine a list: each element contains a link (pointer) to the previous one, and the most recent one contains a link to the "end". Something like this:

(1) --> (2) --> (3) --> (4) --> (end)

In the example with replaceFirst (see above), this list was called xs :

     (1) --> (2) --> (3) --> (4) --> (end)
xs _/

When I called replaceFirst , it took the tail of the list (starting at 2) and attached a new head 42 . I named this "new" list ys :

 ys _
      \
      (42)_
            \
    (1) --> (2) --> (3) --> (4) --> (end)
xs _/

This shows that in order to "replace" the first element of the list, it is not at all necessary to allocate a new block of memory. The only new cell is the new first element itself, and that's it. All other elements remained in place. Moreover: if I now decide that I no longer need the list xs , then the memory collector will have to remove only the number 1 , since all the other elements of the list are still involved.

Also note that this approach is only possible when you know for sure that the list items will never be changed. Only under this condition can I treat the lists xs and ys as two different lists, although they occupy partly the same memory.

Of course, a singly linked list is the simplest data structure. In more complex cases, more complex structures are used – mainly trees of all types and sizes. And of course, there are situations in which this approach is still less efficient than a contiguous in-memory array. However, in 99% of cases, the difference in performance is absolutely negligible and cannot be compared with the increased stability and ease of development.

For the remaining 1% of cases, however, even Haskell allows you to work with contiguous arrays – only it does it in a much safer way (see below)

Imperative programs

"Imperative" are programs expressed as a sequence of steps. This is in contrast to the "declarative", which is expressed in the form of relationships between parts – functions and data. All the examples I have given above are "declarative". They are all written in the style " xs is ys , where the first element is replaced by 42 " and so on. At first glance, it may seem that there is no difference in principle, but this is not so. The main difference is that an imperative program has an order of actions, while a declarative program does not. I have indicated how the value of xs is related to ys , but I did not indicate in what order I would like this relationship to be calculated. The compiler will figure it out for me.

But this is not always advisable. Although "in principle" any program can be expressed in both ways, sometimes from a purely human point of view it is more convenient to think of a program as a functional relationship, and sometimes as a sequence of steps.

For example, consider the same game. Imagine that I have a certain data type Game and a set of operations for it:

type Game = ...
lookForMonsters :: Game -> (Game, Monster)
shoot :: Monster -> Game -> (Game, Damage)

Note that each function, in addition to its "main" result ( Monster or Damage ), also returns a "new" changed state of the game. This "new" state must be passed to the next function, due to which there is a certain order of execution of these functions, something like this:

game0 = createGame
(game1, monster) = lookForMonsters game0
(game2, damage) = shoot monster game1
message = "Inflicted " ++ show damage ++ " points of damage

Since the value of game1 needed to call shoot , but is the result of lookForMonsters , it turns out that lookForMonsters needs to be called first and shoot later. This is the definition of the order of the calls.

This style, however, is quite tedious and creates the danger of typos. In addition, pay attention to the pattern: all the lines are very similar to each other in shape. The functional programmer lives by patterns. He finds them and mercilessly generalizes to make his code simpler and easier to understand. In this case, let's wrap this pattern in a couple of functions, which I'll call bind and return :

bind x f = \game ->
    let (game1, y) = x game
    in f y game1

return x = \game -> x

Don't go into too much detail if you don't understand it. It is enough to know that the bind function "binds together" two operations on the game, passing the state of the game from the result of the first operation to the argument of the second; and the return function creates an operation that ignores the game state and returns some specific value. With these functions, I can write my program like this:

program = 
  bind lookForMonsters 
       (\monster ->
            bind (shoot monster) 
                 (\damage ->
                     return ("Inflicted " ++ show damage ++ " points of damage)
                 )
       )

(finalGameState, message) = program game0

You can make it a little prettier by playing with newlines and indents:

program = 
  bind lookForMonsters (\monster ->
  bind (shoot monster) (\damage ->
  return ("Inflicted " ++ show damage ++ " points of damage)))

Or even a little more beautiful if you add the >>= operator, which will be a synonym for bind :

x >>= f = bind x f

program = 
  lookForMonsters >>= \monster ->
  shoot monster >>= \damage ->
  return ("Inflicted " ++ show damage ++ " points of damage)

This execution ordering scheme is called "monads", and it is so widely used in functional languages ​​that most compilers provide special syntactic sugar for it. In Haskell, this program can be written like this:

program = do
    monster <- lookForMonsters
    damage <- shoot monster
    return ("Inflicted " ++ show damage ++ " points of damage)

The word do and the left arrows <- converted by the compiler to sequential calls to bind .

After all these transformations, look at what we get: the program looks like a "normal" imperative program in some language like C or Python, and it looks like we "change" the state of the game with each step. But in reality the program consists entirely of "pure" functions without mutations and side effects. Our program takes the initial state of the game as input and outputs the final state at the output, and does not depend on anything else. Thus, we were able to write a program in the form of a sequence of steps, but at the same time we retained all the advantages of not mutating.

Generalization

Take another look at the program I wrote above in a "tricky" form:

program = do
    monster <- lookForMonsters
    damage <- shoot monster
    return ("Inflicted " ++ show damage ++ " points of damage)

Please note that the Game type itself is not mentioned anywhere in this program – that is, the program itself is generally not aware that it works with the game. All knowledge about the game is "hidden" in the bind function and in the primitive operations lookForMonsters and shoot . By the way, these operations themselves may not be primitive either – they themselves can be written in the same style as my program, and are based on even more primitive operations. But that's just by the way.

And the next thought is this: if the program itself does not depend on how the idea of ​​"order of operations" is implemented (namely, this idea is implemented by the bind function), then we can change this implementation without touching the program itself. Our implementation can be simple and "toy", as I mentioned above, or, if we need some additional features, such as interrupting on errors or logging, we can add these features to the bind function, and the program will not notice it.

This thought leads to the idea that "monads" can be different. Moreover, monads can be assembled from blocks with the desired functionality, as from cubes, and at the same time write programs in which each operation uses only those cubes that are important to it, and the rest of the program does not know about them.

Input Output

And now, when we have the concept of "monads", which is an abstract expression of the order of operations, the following thought comes: what if we invent a monad that is not written in Haskell itself, but is implemented inside the compiler? And what if you organize this monad in such a way that the bind function can do all sorts of disgraces, break the rules, and in general be written in C?

Programs written in such a monad will look completely innocent – like a sequence of functions passed through bind , just like the examples above. But then primitive operations will be for real I / O.

For example, the getLine operation:

getLine :: IO String

It has a "side effect" in the real world – it reads a text string from the console. But this string is wrapped in an IO type. Logically, this can be considered as a cell in which a row is located, but you cannot "get" this row from there. The only thing you can do with it is to pipe it as a parameter to another function by calling bind . But here's the bad luck: as a result of this call, the IO type will again necessarily be obtained (perhaps not with a string inside, but with something else), and again it is impossible to get the value from there. You can only pipe it to the third function again with bind – and so on.

As a result, it turns out that as soon as we have performed I / O (i.e., have called one of the IO functions), the only thing that can be done with the result is to perform more I / O, and more, and more. But this is just okay, because the entry point of a Haskell program is exactly the chain of I / O operations, linked through bind . For instance:

main = 
    bind getLine 
         (\name ->
             bind (putStrLn ("Hello, " ++ name ++ "!"))
                  (\_ -> 
                      return 0
                  )
         )

Or the same using the do syntax:

main = do
    name <- getLine
    putStrLn ("Hello, " ++ name ++ "!")
    return 0

This is how I / O works in Haskell. The most primitive operations, such as getLine and putStrLn , are not written in Haskell itself, but in C. This is much the same as calling CreateProcess on Windows or fork on UNIX is implemented not by the program itself, but by the operating system. The most primitive primitives are always one level lower, without this there is nothing.

An alternative view of this structure is as follows: you can think of a Haskell program as a program that itself does not do I / O, but instead generates a program in another language that already does I / O. I am not a big fan of this approach, but it is often used in explanations.

Note that, in principle, you can write absolutely an entire Haskell program directly in IO , and thus enjoy all the same "benefits" as a C or Python program. However, this will completely kill the whole idea. The point of the Haskell language is that programs can be written much safer, more expressive, and faster, if only we give up all kinds of ugliness like IO and give the compiler the opportunity to optimize the program. Believe me, the compiler (especially the Haskell compiler) can do this much better than you and me.

Therefore, real programs are usually written in two layers: the very core of the program, where all the logic and all calculations are written in "pure" functions, and only the outermost, very thin shell deals with IO . Currently, this principle, called in English " functional core, imperative shell ", has become widely known outside of functional programming – as, indeed, many other functional ideas (for example, LINQ , const , then , async , auto , etc. .P.)

"Real" mutation

Now that we got acquainted with the IO monad and learned that its primitives can do all sorts of disgraces, the next thing that comes to mind is that this loophole can be used in cases where Haskell is not fast enough, requires too much memory, or does not suit then for other reasons. In these cases, you can write a library in C and import it into Haskell via IO .

One example of this approach is arrays with "real" mutation support . This library provides IO primitives for creating, modifying, and other operations on arrays:

 main = do 
     arr <- newArray (1,10) 42
     writeArray arr 8 5
     a <- readArray arr 1
     print a

As I noted above, such features are used only in the most extreme cases – when Haskell itself "falls short". I would compare it to writing a C program, but with interspersed assembly language in the most sensitive places. The program turns out to be larger, more incomprehensible, more fragile, and limits the possibility of optimization, since the compiler does not know what exactly is happening inside these primitives, and therefore is afraid to touch them.

Scroll to Top