Question:
Good afternoon dear community.
I ask you to conduct a code review and give advice. There is a program. It needs to run the same external EXE file N times with different input parameters. Wait for all running processes to complete and collect the results into a single, say, array.
How to use parallelism and asynchrony (if the latter is needed here) with maximum efficiency?
I have made this solution so far:
List<Tuple<int, int, string, string>> processignDataList = ... //список кортежей с входными данными
Task[] stackTasks = new Task[processignDataList.Count]; // массив всех тасков
List<Tuple<int, string>> tasksResults = new List<Tuple<int, string>>(); //список кортежей с результатами отработки тасков
int i = 0; // счетчик, нужен только для указания элемента массива
foreach (Tuple<int, int, string, string> pair in processignDataList) {
stackTasks[i] = Task.Factory.StartNew(() => tasksResults.Add(Tuple.Create(pair.Item1, pairProcessing(cSettings, pair))));
i++;
}
Task.WaitAll(stackTasks);
string pairProcessing()
method:
- takes properties from a special
cSettings
object, input parameters from pair, - starts EXE as Process –
process.Start();
- there is also a limit from infinite hanging –
process.WaitForExit(3000)
- then checking that
process.Close();
- parsing
StandardOutput
, and based on it - formation of a certain string and its return as a result of the work of the
pairProcessing
method
Am I able to run multiple EXE instances and run them in parallel with this solution? Does the pairProcessing()
method need any additional annotations, are there any requirements for it, or can it be anything? Do you know better solutions? (And the solution is always better …) Are there any shoals? Thank you.
Answer:
First, you don't have access to tasksResults
in sync – there can be a nasty race here.
Secondly, if there are many processes, you may run out of threads in the pool, which will undesirably limit parallelism. However, too many processes will still not be able to run in parallel.
The first problem is solved quite simply – instead of writing to the list inside the task, you need to let the task return a value:
List<Task<Tuple<int, string>>> stackTasks = new List<Task<Tuple<int, string>>>();
foreach (Tuple<int, int, string, string> pair in processignDataList) {
stackTasks.Add(Task.Factory.StartNew(() => Tuple.Create(pair.Item1, pairProcessing(cSettings, pair))));
}
Tuple<int, string>[] tasksResults = Task.WhenAll(stackTasks).Result;
You can also use linq:
Tuple<int, string>[] tasksResults = Task.WhenAll(
from pair in processignDataList
select Tuple.Create(pair.Item1, pairProcessing(cSettings, pair)))
).Result;
or like this:
Tuple<int, string>[] tasksResults = Task.WhenAll(
processignDataList.Select(pair => Tuple.Create(pair.Item1, pairProcessing(cSettings, pair))))
).Result;
To solve the second problem, you need to move from tasks to threads. Or increase the thread pool size with ThreadPool.SetMaxThreads
and ThreadPool.SetMinThreads