node.js – Competent caching in a NodeJS project

Question:

A highload project is planned using NodeJS, MySQL, Redis, NGINX. The nodejs framework express is taken as a basis, and for working with sockets (maybe this is not a good solution) – socket.io. The project is a single page application, it is assumed that the socket connection will be open everywhere. There will be a large number of pages that are statistically informative and do not spare the database with heavy queries.

There are several questions about how to build data caching, and where to use Redis, and where to use other methods.

  1. For example, there is a request to the database, we can save its result in Redis as JSON, with the next calls we will already take it from Redis and, when necessary, invalidate this cache. – How correct is it? And is it worth storing data in JSON?

  2. We sometimes need a cached page not only in JSON, but a piece or the whole HTML. – Is it correct to store these chunks in Redis?

  3. Have an idea to implement session-storage in Redis, how to do it right? for example, the key will be the session id, and the value will be an array of customer data. – Is this correct or not? How will be better?

  4. When is it worth writing the cache to files on disk?

  5. Is it worth it to shove everything that is described above in redis?

I have no experience in highload, so I am interested in everything you say, I really need competent advice in building a project on the listed components. Maybe there are some recommendations for working with sockets in highload, or something else regarding the basics of such a project.

I appreciate your helpful advice, thank you for your attention!

Answer:

For example, there is a request to the database, we can save its result in Redis as JSON, with the next calls we will already take it from Redis and, when necessary, invalidate this cache. – How correct is it?

One hundred percent, but timely cache invalidation is a pretty big challenge.

And is it worth storing data in JSON?

In general, I am somewhat jarred to save data as a string, but there is no particular choice, and in general this should not affect anything.

We sometimes need a cached page not only in JSON, but a piece or the whole HTML. – Is it correct to store these chunks in Redis?

No, you have a SPA, all rendering must be on the client.

Have an idea to implement session-storage in Redis, how to do it right? for example, the key will be the session id, and the value will be an array of customer data. – Is this correct or not? How will be better?

If your session is considered permanent storage, then it is wrong, because Redis is not designed for permanent data storage (although it can flush data to disk from time to time). Instead, it is better to organize a two-tier (or even three-tier) storage from a Redis – DB (In-Memory – Redis – DB in the case of a three-tier link).

When is it worth writing the cache to files on disk?

As a rule, none of them have a database. You can drop the results of heavy queries there so that all nodes can see them, but even in this case, I would organize an intermediate Redis layer.

Is it worth it to shove everything that is described above in redis?

The more it will hang in the cache, the better, but from time to time look at the amount of eaten RAM.

Maybe there are some recommendations for working with sockets in highload, or something else regarding the basics of such a project.

Remember two things

  • The project must scale horizontally, i.e. by simply adding new servers. It follows from this that changes on one server should be visible to all others (this is achieved through the use of one Redis and a database)
  • There is such a thing as dogpile effect, which means a bunch of users simultaneously requesting the same resource. In the case of the above-described three-tier storage, one hundred synchronously arrived users will not find the entry in the cache and will rush (all one hundred) to the database. If 20 concurrent connections to the database, and each request takes 30ms, the last user will receive data not earlier than 150 ms (it is in the ideal case, in fact, there will be much used to proc eed number), which is very bad.
Scroll to Top