c# – Best search algorithm in C #

Question:

I was wondering if anyone knows what type of methods are the highest performance to implement in very intensive searches (in arrays) (I'm talking about arrays of millions of elements) from C #, for example it is better to use IndexOf or BinarySearch to obtain the index of the element ? should I use arrays or HashSet? and how could I use the HashSet to find the matches in the array?

A logical example of the algorithm I need:

Block: 071827490593720123213023498230402000813

Value to find: 40200

Objective: Return the index where said group of numbers is found

What it should return: 30

Code that I currently implement:

int offSet = 0;
while ((offSet = Array.IndexOf(bloque, encontrar[0], offSet)) != -1)
{
    if (encontrar.Length > 1)
        for (int i = 1; i < encontrar.Length; i++)
        {
            if (bloque.Length <= offSet + encontrar.Length) break;
            else if (encontrar[i] != bloque[offSet + i])
            {
                if (bloque[offSet + i] == encontrar[0])
                { offSet += (i - 1); break; }
                else if (i == encontrar.Length - 1)
                { offSet += i; break; }
                break;
            }
            else if (i == encontrar.Length - 1)
                addresses.Add(new IntPtr((int)baseAddress + offSet));
        }
    else addresses.Add(new IntPtr((int)baseAddress + offSet));
offSet++;
}

This algorithm is not slow, but it is not as fast looking as the program I am comparing it to. The program that I am developing opens the processes and looks for values ​​in their memory regions (Yes, I am comparing it with Cheat Engine). As you can see it is more or less similar to the Boyer Moore algorithm, but I need to know if I can replace functions to increase performance or if I should remove or change something in the logic of the algorithm to increase performance.

https://es.wikipedia.org/wiki/Algoritmo_de_b%C3%BAsqueda_de_cadenas_Boyer-Moore

Answer:

First of all, I would profiling the executions to see if they tell me something about the behavior of the code.

Things I would try, because these things I understand are more trial and error than anything else. You may have done:

  • Completely separate the case where the length of encontrar is 1 (we take out a loop if .
  • Using a for instead of a while with what I understand you could eliminate the offSet += (i - 1) leaving only the break
  • I would seek to delegate or delay the creation of objects (ie the addresses.Add(new IntPtr((int)baseAddress + offSet)) . The instantiation and addition to the collection could be expensive compared to the execution of the loop.

UPGRADE

Rational: You are doing a cast, creating an IntPtr object and adding it to address. The second operation in particular may be expensive compared to the rest of the algorithm, after all creating objects is not free (it is a hypothesis that you should test).

What could be done about it? Well, you could try delegating that task to another process or thread.

That is, build a producer / consumer schema, where the producer (your current process) passes the baseAddress + offset to the consumer and the consumer creates the IntPtr and adds it to its collection.

Then at the end of the execution, you can ask the consumer that creates the IntPtr to return the collection of IntPtr generated.

The idea is that in a multiprocessor environment the search and creation process fall into different processors and with this you could gain better performance.

Here is an article about it.

END
Of course, I can be wrong about everything. It may be much more interesting to read your results in response form.

Scroll to Top