The outline of my question is how to write a process to remove multiple unnecessary strings from a string in list comprehension notation.
Remove unnecessary ["a", "b", "c", "NULL"] from
words = "a皆a様aよbろbしbくcおc願cいしNULLまNULLす。" alist = ["a", "b", "c","NULL"] for delstr in alist: words = words.replace(delstr, "")
I am doing the processing. However, since there are many words and alist is huge, I thought that it would be possible to speed up with list comprehension notation, so I wrote the following code.
[words.replace(delstr, "") for delstr in alist]
However, this does not serve the purpose because it gives a list of words with each element of alist removed.
One way is to put the above words in a list comprehension and remove the alist element from the list of words for each loop, but if the inclusion notation allows multiple processing on a single object, a list of words. I think it is more desirable because and alist can be included in the list comprehension. I would appreciate it if you could tell me if there is a way to remove all the elements of alist from words by list comprehension.
It is a plan to use
import re words_re = re.compile("|".join(re.escape(w) for w in alist)) result = re.sub(words_re, '', words)
Please note that the result may be different from the case of repeating
replace depending on the contents of the deleted character string (
alist ). for example,
words = "<abc>" alist = ["b", "abc"]
replace only deletes
b and the result is
<ac> , and if it is a regular expression,
abc matches start first, so
abc is deleted and the result is