Question: Question:
The outline of my question is how to write a process to remove multiple unnecessary strings from a string in list comprehension notation.
for example,
"a皆a様aよbろbしbくcおc願cいcしNULLまNULLす。"
Remove unnecessary ["a", "b", "c", "NULL"] from
"皆様よろしくお願いします。"
To get
words = "a皆a様aよbろbしbくcおc願cいしNULLまNULLす。"
alist = ["a", "b", "c","NULL"]
for delstr in alist:
words = words.replace(delstr, "")
I am doing the processing. However, since there are many words and alist is huge, I thought that it would be possible to speed up with list comprehension notation, so I wrote the following code.
[words.replace(delstr, "") for delstr in alist]
However, this does not serve the purpose because it gives a list of words with each element of alist removed.
One way is to put the above words in a list comprehension and remove the alist element from the list of words for each loop, but if the inclusion notation allows multiple processing on a single object, a list of words. I think it is more desirable because and alist can be included in the list comprehension. I would appreciate it if you could tell me if there is a way to remove all the elements of alist from words by list comprehension.
Answer: Answer:
It is a plan to use re.sub
.
import re
words_re = re.compile("|".join(re.escape(w) for w in alist))
result = re.sub(words_re, '', words)
Please note that the result may be different from the case of repeating replace
depending on the contents of the deleted character string ( alist
). for example,
words = "<abc>"
alist = ["b", "abc"]
If so, replace
only deletes b
and the result is <ac>
, and if it is a regular expression, abc
matches start first, so abc
is deleted and the result is <>
.