c# – How to speed up the process of finding a match in two lists

Question:

There are two files. I read both values ​​from them into lists. I want to take unique values ​​from both lists, but the search and comparison process takes a very long time (~ 300k lines in both lists). Tell me how to speed up the search and comparison process *?

 var lst1 = File.ReadAllLines(@ "D:\test\1.csv").ToList();
 var lst2 = File.ReadAllLines(@ "D:\\test\2.csv").ToList();

 var rez = lst2.Where(x => !MySequenceContains(lst1, x)).
 Select(q => string.Join(";", q)).ToList();

 }

 private bool MySequenceContains(List < string > x, string y) {
  bool contains = false;
  index++;
  label2.Text = index.ToString();
  foreach(var a in x) {
   // ToDo: tweak the string comparison as needed
   if (string.Compare(a.Split(';')[0], y.Split(';')[0], StringComparison.InvariantCultureIgnoreCase) == 0 &&
    string.Compare(a.Split(';')[14], y.Split(';')[14], StringComparison.InvariantCultureIgnoreCase) == 0) {
    contains = true;
    break;
   }
  }
  return contains;
 }

Answer:

First, your code returns lines that are in the second file but not in the first. It does not return the lines that are in the first but not in the second.

Second, you are using Split inside a loop, i.e. split the same lines many times.

Thirdly, inside a loop through one file, you loop over another (look for this value in another file. If the file sizes are the same, then the algorithm is slow N ^ 2.

If you first perform Split for all input lines and sort the lists according to your criterion, then you can select non-repeating elements in one cycle from two (sorted) lists at once. The complexity is 2N + 2NlogN (sorting).

Here's an example, sorry to be illegal:

private List<string> my(List<string> x, List<string> y)
{
    var rez = new List<string>();
    var lst1 = new List<string[]>(x.Select(s => s.Split(';')));
    var lst2 = new List<string[]>(y.Select(s => s.Split(';')));

    lst1.Sort(MyComarer.Comparer);
    lst2.Sort(MyComarer.Comparer);

    var isPresentInX = false;
    int j = 0;
    for (int i = 0; i < lst1.Count; i++)
    {
        if (j >= lst2.Count)
        {
            rez.Add(string.Join(";", lst1[i]));
            continue;
        }
        var comp = MyComarer.Comparer.Compare(lst1[i], lst2[j]);
        while (comp > 0 && j < lst2.Count)
        {
            if (!isPresentInX)
                rez.Add(string.Join(";", lst2[j]));
            j++;
            if (j < lst2.Count)
            {
                if (MyComarer.Comparer.Compare(lst2[j], lst2[j - 1]) != 0)
                    isPresentInX = false;
                comp = MyComarer.Comparer.Compare(lst1[i], lst2[j]);
            }
        }
        if (comp != 0)
            rez.Add(string.Join(";", lst1[i]));
        else
            isPresentInX = true;
    }
    return rez;
}

private class MyComarer : IComparer<string[]>
{
    public static MyComarer Comparer { get; } = new MyComarer();

    public int Compare(string[] x, string[] y)
    {
        var res = string.Compare(x[0], y[0], StringComparison.InvariantCultureIgnoreCase);
        if (res == 0)
            res = string.Compare(x[14], y[14], StringComparison.InvariantCultureIgnoreCase);
        return res;
    }
}
Scroll to Top