Why does PHP allow you to create identifier names with special characters?

Question:

Usually in programming languages ​​and databases, identifier names (variables, functions, classes, methods, tables, fields, etc.) must start with a letter or underscore , numbers can follow and to avoid problems accented and special characters must be avoided.

In PHP variables must start with a dollar sign $ but it is possible to create identifiers with strange names like the ones below:

<?php
header('Content-Type: text/html; charset=utf-8');

function executarAção($ação){
    echo $ação .' <br>';
}

function 웃(){
    echo 'boneco de palito da uml é você mesmo? <br>';
}

function variavelEstranha(){
    ${0} = 'Olá mundo estranho :D';
    echo ${0};  
}

executarAção('kboom');
웃();
variavelEstranha();

Question:

Why does PHP allow you to create variables and functions with special characters?

Example

Note: in my test, I saved the file as utf-8.

Answer:

Because programmers from other places in the world, like Saudi Arabia, for example, would probably like to be able to create a function like this:

نفعلشيئا()

There is no reason for the compiler to prohibit the use of "special characters" in source code.

You can agree not to use it in your project, for some specific reason, and you can even configure a source code parser to break the build if it finds characters prohibited by convention. But the compiler doesn't know some aspects of your team's culture or isn't concerned.

Update: sorry for using the word "compiler" 🙂 Some languages ​​are interpreted only. But the answer is the same.

By the way, your example, 웃, seems to be a valid character in Korea ("smiling" according to Google Translate).

Update 2 – a brief reflection: Why don't we use special characters on our systems?

What makes a character special?

In this documentation for a Windows feature, I found an interesting definition:

Special characters are characters that are not found on the keyboard.

Now, on my keyboard I see everything I need to type "executeAction" . So some special character definition is wrong (ours here in this question or Microsoft's documenter).

I liked this other definition better :

These are characters such as periods, symbols (@ * ! % ; : . ) or blank spaces that are not accepted by the registration system for filling in the username and password fields.

There yes a good definition. It establishes the domain for its definition of special characters: certain fields of the registration system.

So I conclude that the definition of "special characters" varies by context. What is special character here, may not be there.

Our definition of special character:

We, Portuguese language programmers, consider special even the characters that are part of our life: cedilla and accents. We don't like them in our systems because…because…why even? Of course, we don't even need a formal definition, our experience reveals that this only causes problems:

  • Each programmer saves the file with a different encoding, so the cedilla you saved on your machine appears as a weird symbol on mine.

  • Every application that uses our database has been compiled or will be interpreted using a different encoding, so the SELECT I wrote in one application using cedilla won't find the table created with cedilla from another application.

  • Every programmer has a different "notion" of which words are accented, so the function I name with an accent will not be easily found by another programmer who believes the word has no accent.

  • And so on…

Conclusion: well, a well-intentioned PHP programmer should find it cool that Portuguese-speaking people can use its accent, which is so common, in the source code. If the interpreter has no problem handling these characters, why limit their use? And, of course, as I mentioned, there are many languages ​​out there using all sorts of "special characters".

The problems I mentioned can be ignored by this PHP programmer because it is "easy" to eliminate them: it is enough that everyone uses UTF-8 in their editors and other tools, and it is enough that everyone knows their language well. We don't bet on it (at least I don't) and we follow our tradition of not using special characters.

It is clear that before the advent and massification of more modern encoding standards (UTF-8), the problems regarding the use of "special characters" were serious as encoding standards were limited. Even back then, compilers and interpreters couldn't do much to limit the use of characters because, as noted, the definition of which characters are special will vary.

Scroll to Top