Question:
I have a project that works with a large volume of data, and I need to optimize it to bring results from some calculations in a considerably short time. I know that I have several aspects to consider, such as the structure in which the data were saved in the Database, the way I am accessing them, how I am performing the calculations, among others, but disregarding all these items, I would like you to only the question posed below should be taken into account.
My question is more conceptual than a problem in my code. But something specific…
Consider the following list:
var minhaLista = new List<MeuObjeto>
{
// Objetos...
};
meuObjeto
has the following properties:
public class MeuObjeto
{
public int Prop1 {get; set;}
public string Prop2 {get; set;}
public decimal Prop3 {get; set;}
public bool Prop4 {get; set;}
}
I need to access each of the properties in a repeating loop with n
items in the fastest and most memory-efficient way possible. But if I have to choose between speed and memory, I must choose speed.
Every millisecond is very important, so I'm taking into account some aspects like for
being faster than foreach
, or even declaring a constant with 0
and using it to instantiate the loop control variable is better than instantiating directly with 0
.
Therefore, consider INICIO
as follows:
private const int INICIO = 0;
Consider OutroObjeto
to be an object similar to MeuObjeto
for example only.
Form 1:
var outraLista = new List<OutroObjeto>();
for (int i = INICIO; i < minhaLista.Count; i++)
{
var outroObjeto = new OutroObjeto
{
Prop1 = minhaLista[i].Prop1,
Prop2 = minhaLista[i].Prop2,
Prop3 = minhaLista[i].Prop3,
Prop4 = minhaLista[i].Prop4
};
outraLista.Add(outroObjeto );
}
In this case, for each property , the list is searched for the object at position
i
?
Form 2:
var outraLista = new List<OutroObjeto>();
for (int i = INICIO; i < minhaLista.Count; i++)
{
var meuObjetoI = minhaLista[i];
var outroObjeto = new OutroObjeto
{
Prop1 = meuObjetoI.Prop1,
Prop2 = meuObjetoI.Prop2,
Prop3 = meuObjetoI.Prop3,
Prop4 = meuObjetoI.Prop4
};
outraLista.Add(outroObjeto );
}
Apparently this snippet works similarly to
foreach
, but will the access to each property of the object in positioni
of the list be faster than in Form 1 ?Technically
meuObjetoI
just points to the list object at positioni
that is already allocated in memory, correct?
What would be the best way to take into account time and memory consumption?
Or is there a third option that is better?
Answer:
Precompute as much as possible
Zero is a constant, so you don't have to worry about it. But .Count
will be re-evaluated on every iteration, in case the compiler can't "prove" that minhaLista
is a strictly local variable. As there are no considerations for this, so first optimization is to use for
to lessen pressure on the garbage collector , and .Count
:
var count = minhaLista.Count;
for (int i = 0; i < count; i++)
{
...
}
This optimization assumes that the list is of constant size.
Decrease indirections 1
As commented, each indirection to be resolved is one more processing, at least in theory . This might be a dumb optimization, but again, if minhaLista
isn't strictly local, inside the for
:
for (int i = 0; i < limit; i++)
{
var item = minhaLista[ i ];
var outroObjeto = new OutroObjeto
{
Prop1 = item.Prop1,
Prop2 = item.Prop2,
Prop3 = item.Prop3,
Prop4 = item.Prop4
};
outraLista.Add(outroObjeto );
}
This cuts out repeated accesses like minhaLista[i].Obj
.
Decreasing indirections 2
Simple properties, or properties with simple ascenders ( {get;set;}
) have extremely optimized code, so the code above, although it seems verbose, is perhaps already as fast as it can be. However, such a copy might violate the DRY principle, so creating an OutroObjeto
constructor that accepts MeuObjeto
is interesting because:
- Improve the expressiveness of your code
- Decreases indirection, as half of the properties will be local accesses of one of the objects.
An alternative is to hang a .ToOutro()
method on MeuObjeto
, doing something similar to the content of for
, above.
var minhaLista = new List<MeuObjeto>();
...
var outraLista = new List<OutroObjeto>();
var count = minhaLista.Count;
for (int i = 0; i < count; i++)
{
outraLista.Add( minhaLista[ i ].ToOutro() );
}
It depends a bit on personal taste. If you find a .To
mix up application domains preferable, an extension method is the way to have code that links the objects, without being in any of the objects.
Readonly collection
The List<> machinery is complicated. Has to be. But trying to replace this functionality with a homemade implementation doesn't have the best of predictions. However "all" your code above does is copy one list of objects into another. Hence the question, new list has to be dynamic?
If the answer is in the negative, specifically if the new list created is not later modified in terms of the number of records , then:
var minhaLista = new List<MeuObjeto>();
...
var count = minhaLista.Count;
var outraLista = new OutroObjeto[count];
for (int i = 0; i < count; i++)
outraLista[ i ] = minhaLista[ i ].ToOutro();
readonly object
This is a radical proposition, but one that is actually used in places where performance is an absolute priority: read-only objects, and if they are small enough, structs instead of objects.
It is not a transition that should be made without much study. Going from mutated classes to struct is difficult, and error-prone. If these objects come from the database, then the suggestion is not to go that way.
But if these objects are somehow ephemeral data, loaded to be discarded or products of calculations, using read-only classes/structs gives you the guarantee that this data will not be changed under any circumstances, which may avoid this type of conversion, and opens some compiler-level optimizations, which can speed up code execution quite well.