PHP security issues. Filtering incoming data


Recently, I began to be interested in the security of the applications being created, I am interested in several points related to data filtering in PHP and secure authentication / authorization. In Google I find a lot of different information, but it is not enough to draw conclusions. Perhaps someone will prompt or direct you on the right path, so to speak.

Main questions:

  • Do I need to filter completely all input data, incl. global arrays $_SERVER , $_REQUEST , $_GET , $_POST , $_COOKIE , even if they are not entered into the database. What general points should I consider?
  • filter_var() better to use filter_var() , filter_input() , etc., or use regular expressions. Or when it is better to use one instead of the other.
  • What method of authorization on the site can be considered safe?
  • Using PDO, can I not be afraid to bind variables right away (I've never done that, just wondering how safe it is) bindValue(':param', $_POST['value']);
  • If I have an HTML (wysiwyg) editor, then I need to use the htmlspecialchars($var, ENT_QUOTES, 'UTF-8'); functions before saving to the database htmlspecialchars($var, ENT_QUOTES, 'UTF-8'); and htmlspecialchars_decode($var, ENT_QUOTES);

What I have now:

Authorization on the site is as follows: The user enters a username / password. There is a request to the server and tries to get data (id, password, unique user hash) for the specified login. If there is one, then the password is checked using the password_hash function password_hash($password, PASSWORD_DEFAULT); , and if successful, a cookie is created, and a new hash for the user:

$user_hash = md5( md5( time() + time() * rand(2, 10) ));
SessionModel::setCookie('_auth', md5($user_id), AUTH_TIMEOUT); SessionModel::setCookie('_token', $user_hash, AUTH_TIMEOUT);

So far, I have not yet figured out where and how to use this hash wisely to verify the identity of the user. Most likely, it doesn't smell like security here, that's why I ask for advice.

Filtering data:

About two weeks ago I completely switched over to OOP and started using PDO, before that I used mysqli to connect, respectively, to clean up the incoming data, I wrote my functions, like:

function clear($var) {

    $link = mysqli_connect(HOST, USER, PASSWORD, DB) or die( mysqli_error($link));
    $var = strip_tags($var);
    $var = htmlspecialchars($var);
    $var = mysqli_real_escape_string($link, strip_tags($var));

    return $var;

Now I do not use filtering for incoming data at all, except for html code, for this I use

$encoded = htmlspecialchars($var, ENT_QUOTES, 'UTF-8');

$decoded = htmlspecialchars_decode(htmlspecialchars_decode($var, ENT_QUOTES), ENT_QUOTES);

The latter is repeated once more, then that from the first time it for some reason normally did not display the decrypted entities, I do not know why, but purely by chance it worked in this way. I accept the rest of the data as something like this:

$title = $_POST['title']; – sometimes I use trim() to remove spaces :))

In general , I understand that it is unlikely that I will get a detailed answer to each of the questions here, but I would be very grateful, even for an article that is current today with answers or an answer to such questions. I've been studying PHP for about 1.5-2 years, but I don't know the answers to the simplest questions (or not simple ones). In Google, it is difficult to find such a thing, as practice has shown.

And I will be glad to general recommendations 🙂 Thank you.


Do I need to filter all input data completely

It is your responsibility not to trust any data received from outside. For example, $ _GET, $ _POST (by default, the two of them make up $ _REQUEST, according to the php.ini request_order and variables_order settings, this can also include cookies, $ _SERVER and environment variables), $ _COOKIE, $ _FILES, data loaded from third-party systems ( for example, by API). The general point is that you should not look for an abstract filter from dangerous data, but understand what data you expect to find in this place and what will happen with this data next. Output to CSV, HTML or writing to a DBMS – each requires its own special handling.

Which is better to use filter_var (), filter_input (), etc., or use regular expressions.

Anything that will allow you to make sure that the data is correct. You need to start with a whitelist. Often you know in advance that, for example, $ _GET ['index'] you can only have foo or only bar . Check for these two permissible values.

For example, for a user's email there is a regular regular filter_var hidden in filter_var . This is a good starting point and will usually work well. "Usually" – because email is a really funny thing. If you read the corresponding RFCs, it turns out that it is easier to check for the content of the @ symbol and send a letter, than to understand all the variety of valid options. Almost everything is acceptable there.

For, for example, a login, you may want to restrict input to only Latin letters and some special characters. This is most easily done with a regular.

The broadest interpretation, usually for free text input. For example, for this very message. Generally, any UTF8 characters are allowed.

By the way, since I started talking about this: please, do not validate the password in any way, except perhaps for the minimum length. And only if it is unambiguously required by the subject area, then by the minimum complexity. But in any case, do not limit the maximum. You still have to hash it, not store it, let the user enter what he likes and the length he likes.

What method of authorization on the site can be considered safe?

Depending on the security requirement. EDS is quite difficult to bypass (figuratively, the banking sector). Difficult to work around if authorization is only allowed from one specific IP of one specific VPN (corporate data). For a site that is not so sensitive to security – HTTPS (with the correct server-side settings! Over the past years, it has become quite easy to incorrectly configure HTTPS) will adequately cover from MitM and encrypt the data.

You can hash the original password on the client and send the hash to the server so that the original password is not transmitted over the network at all.

Without HTTPS? Do HTTPS, the days of expensive certificates are over.

Using PDO, am I not afraid to bind variables right away

There is no SQL injection in this case. And immediately an important caveat: only if you have correctly configured the connection encoding or disabled the emulation of prepared expressions.

But you still have to check logical errors. For example, do you think it is safe to use (int) $ _POST ['amount'] as: amount?

UPDATE users SET balance = balance - :amount WHERE id=:user

(for example, in reality in such a place there will be a double accounting entry, which is additionally validated by check, at the level of writing to the subdat (especially if the same mysql could do check at all), but as one DBA says, people understand money faster).

And if you pass -100? Will we receive an accrual of money instead of a write-off?

If I have an HTML (wysiwyg) editor, then I need to use the functions before saving to the database

A very interesting question and behavior depends on the degree of trust. Do you trust whoever uses this editor? Those. does the output have to be real HTML and needs to be output as HTML? This is a common thing for the admin panel of any CMS. Then you don't have to validate this field at all. htmlspecialchars($var, ENT_QUOTES, 'UTF-8') must be called for this text when substituted in textarea, otherwise a random one in the text will break everything.

If you do not trust, but there will be HTML, then you must thoroughly parse into tokens and check all transmitted HTML against the whitelist. I won't tell you any specific tools, I only know that there are some. The problem is that, for example, you want to give the ability to insert <img src> , and you will be given some <img src='...' onload="alert(document.cookie)"> and that's it. Instead of a harmless alert, there may be something more interesting. But htmlspecialchars is not allowed, otherwise there will be no picture either.

If HTML shouldn't be at all – then htmlspecialchars. It can be used before writing to the database, but logically it is more appropriate to apply it directly when outputting to HTML. But not strip_tags. Why are you deleting what the user has entered? You must save it correctly and show it correctly, not delete it.

If there is one, then the password is checked using the password_hash function ($ password, PASSWORD_DEFAULT);

Is this a mistake in the question? password_hash doesn't check anything. Checks password_verify.

Why are you writing something oh how far from CSPRNG, apparently, in cookies and how you plan to use it later – I also have no idea.

CSPRNG is a cryptographically strong pseudo-random number generator.

For session authorization session and use. Let me remind you only of one obvious pitfall, which is not always paid attention to: a session has no lifetime. Absolutely not. There is only the amount of time from the last call to this session, after which this session can be removed by the garbage collector. And when the garbage collector starts – who knows. And all this time, the session is still valid. Therefore, if for your task it is necessary to invalidate authorization an hour after authorization or after the last user request, you must do this logic yourself.

For long-term authorization – in my opinion, this answer is already huge. Better as a separate question.

Filtering data:

See the beginning of the answer. You need to know what you want to find in this data and where this data will go next. The rest of the security does not apply, only crutches and security illusions. There is no magic function "make me right and safe".

And of course, you cannot be sure that such data came to you at all. Check for isset first, or empty if valid for values. Or filter_input, it will also correctly respond to missing keys.

And since we already mentioned CSRF here: remember that everything that changes the state of the system must be done via POST, PUT, PATCH or DELETE requests (if we are not talking about API, then usually only POST is used) and be covered with a unique token. Unique in general or unique for a user or for a session is a debatable issue. GET requests should only read information. Two identical GET requests must return the same result. Sometimes you have to deviate from this rule, for example, for the link "unsubscribe from the mailing list" in letters (changes the data on subscriptions), but this is precisely the exception. There is no need to delete anything via a GET request.

Scroll to Top