java – Speed ​​up the search for strings in an array with a length less than average

Question:

Performed the task: to find in the array all strings whose length is less than the average length of all strings. The result is an algorithm that runs in linear time:

public List<String> getLessMiddle(String[] src) {
    final List<String> result = new ArrayList<>();

    int common = 0;

    for (String s : src) {
        common += s.length();
    }

    final int middle = common / src.length;

    for (String s : src) {
        if (s.length() < middle) {
            result.add(s);
        }
    }

    return result;
}

Can it be made faster?

Answer:

Complexity better than linear to get here will not work exactly. In the general case, it is also unlikely that it will be possible to perform a search faster than in two passes:

  • In order to determine the average length, you need to go through all the lines at least once and add their lengths.
  • Then you need to pull out the lines with a length less than the average. In the worst case ( n-1 identical lines and one slightly longer) there will be n-1 such lines, and in order to pull them out, at least n-1 operations will be needed.

Only 2n-1 , which is not much better.

Perhaps these strings have some property that is not described in the condition, for example, an uneven distribution of lengths. Perhaps, due to this property, it will be possible to put the lines into a certain tree-like line and make the selection faster for the average case. But the gain will not be significant, and if the optimization of the algorithm is not a vital task, then you should not complicate the code.

Because I have nothing to say about the optimization of the algorithm, I will list the problematic points on the algorithm / code.

  • Division error. Integer division is used ( common / src.length; ) if common is not even divisible by src.length , rounding down will occur. This rounding will exclude rows that should not be excluded when selecting. For example: if the input array {"a", "bb"} , then the average length is 1.5. In this case, the string "a" must be included in the result. Now middle will be set to 1 and the result will be empty.

    To fix it, you can round up, either use a real type for middle , or solve the inequality and do without division at all (in this case, long may be needed).

  • The case when the input array is empty is not handled in any way (division by zero will occur).

  • The sum of the lengths of strings can go beyond int , leading to overflow and incorrect results.

  • The result variable can be moved closer to usage:

     int common = 0; for (String s : src) { common += s.length(); } final int middle = common / src.length; final List<String> result = new ArrayList<>(); for (String s : src) { if (s.length() < middle) { result.add(s); } } return result;
Scroll to Top