Processes do not run in parallel in python3 multiprocessing Process

Question: Question:

I make a music player by myself.
Language: python3.7
Environment: VS2017 community
OS: windows 10 64bit
cpu: core i3

I'm new to programming. Please point out if you are writing something strange.
In order to read a wav file from a music CD (wav, 16bit, 44100Hz) and enable gapless playback, all songs are concatenated with a numpy array.
In order to shorten the time of "reading & concatenating", I would like to divide the files of 14 songs in the CD album into two lists and then perform "reading & concatenating wav file data" in parallel in two processes.

detail:
The data size in the list is divided into two lists "self.fl1ba, self.fl1bb" with a boundary of around 300MB.
After that, I am trying to save time by parallel processing using the Process class of multiprocessing from "self.fl1ba" and "self.fl1bb".
However, even if I set Process and "start ()", it does not become parallel processing.
vstack1 = mp.Process (target = stack.fl_vstack1 ())
If you delete "()" of ".fl_vstack1 ()", it will not work.
I also tried it with the Pool class, but when I looked it up, it seemed that the 2D array of numpy.array could not be processed by using "map" in the case of Pool.

The wav file reading module uses pysoundfile.
You can get the 2D data array (data) and sampling frequency (fs) with data, fs = soundfile.read (file, dtype ='int16').
Playback uses the sounddevice module. Enter the code here

Please tell me how to parallelize file reading & concatenation.
Thank you.

Supplement 1:
Thank you for your answer, nekketsuuu.
Please process it in about 5 (s) with the source you answered.
However, probably because the function is "def fl_vstack1 (self)" in my code, I feel that "self" has been replaced with "None" by setting "args = ()". When executed, the function "def fl_vstack1 (self)" was not executed. I think the processing itself in the function is not working. This is because the [0,0] that was assigned when the array array "self.sumdata1" was initialized was deleted in the function, but that was not executed either. Is it possible to use "self" in the Process class? Thank you for your guidance.

Supplement 2:
Thank you for your comment, metropolis.
I took a log as you pointed out. What! The "vstack1" and "vstack2" processes were working. It turns out that another problem hasn't run to the end.
Another problem is that the values ​​processed by the two processes "vstack1" and "vstack2" are not passed to the next two processes "tvstack1" and "tvstack2". I expected it to be stored in "self", but it turned out to be different. I'm going to find out if I should use something called "Pipe" or "Queue".

import os
import multiprocessing as mp
import numpy as np
import time
import soundfile as sf
import sounddevice as sd

class Fileread():

    def __init__(self):

    ### ディレクトリから「wavファイル」のみをself.fl1bに取得 ###
        os.chdir("C:/Users/Public/Music/Music Center/Shared/Music/Aerosmith/Unplugged 1990 (Live)")
        dir1 = os.getcwd()
        self.dir2 = os.listdir(dir1)

        self.fl1b = []
        for j in self.dir2:

            base,ext = os.path.splitext(j)

            if ext == '.wav':

                self.fl1b.append(j)

        self.sumflsize1 = 0
        self.sumflsize2 = 0

        self.fl1c1 = []
        self.fl1c2 = []

        self.fl1ba = []        
        self.fl1bb = []

        self.sumdata1 = np.array([0,0],dtype='int16')
        self.sumdata2 = np.array([0,0],dtype='int16')

        self.fl1t1 = []

        self.fs = 0




        ### リスト内ファイルを2分割するメソッドを格納 ###
        self.split_album()

    ### リストとして取得したファイルを別の2つのリストに2分割 ###
    def split_album(self):

        flsize = []

        ### アルバム内の各ファイルサイズを取得 ###
        for h in self.fl1b:

            flsize.append(os.path.getsize(h))


        ### アルバム内ファイルをリスト「self.fl1ba」と「self.fl1bb」に格納 ###
        list_len = len(self.fl1b)

        for d in range(list_len):

            self.sumflsize1 += flsize[d]
            self.fl1ba.append(self.fl1b[d])

            if self.sumflsize1 > 300000000:

                break

        for q in range(list_len):

            if q <= d:
                pass
            else:
                self.sumflsize2 += flsize[q]
                self.fl1bb.append(self.fl1b[q])

    ### リスト「self.fl1ba」内の各ファイルを'int16'で取得し、「self.sumdata1」に追加 ###
    def fl_vstack1(self):

        list_len1 = len(self.fl1ba)

        for k in range(list_len1):

            data,fs = sf.read(self.fl1ba[k],dtype='int16')
            self.fs = fs  ### fsはサンプリング周波数 ###

            frame_num = len(data)
            self.fl1c1.append(frame_num) ### 各ファイルの再生時間を算出するためにフレーム数を取得 ###
            self.sumdata1 = np.vstack((self.sumdata1,data)) ### 「self.sumdata1」にwavファイルから読み込んだdataを追加 ###


        self.sumdata1 = np.delete(self.sumdata1,0,0)

    ### 「fl_vstack1」と同じことを「self.fl1bb」内のファイルについても実行 ###
    def fl_vstack2(self):

        list_len2 = len(self.fl1bb)

        if list_len2 > 0:  ### リスト「self.fl1bb」にファイルが存在する場合 ###

            for k in range(list_len2):

                data,fs = sf.read(self.fl1bb[k],dtype='int16')

                frame_num = len(data)
                self.fl1c2.append(frame_num)
                self.sumdata2 = np.vstack((self.sumdata2,data))

            self.sumdata2 = np.delete(self.sumdata2,0,0)            

        else:
            pass

    ### 「fl_vstack1」と「fl_vstack2」で積み上げたデータを連結 ###
    def sum_stack(self):

        self.sumdata1 = np.vstack((self.sumdata1,self.sumdata2))

    ### 各ファイル(曲)の演奏時間を算出 ###
    def sum_time(self):

        self.fl1c1 = self.fl1c1 + self.fl1c2
        self.fl1t1 = list(map(lambda x: x/self.fs/60,self.fl1c1))

    ### 連結したファイルを再生(ギャップレス再生) ###
    def sdplay(self):

        sd.play(self.sumdata1,self.fs)
        status = sd.wait()


if __name__=='__main__':

    stack = Fileread()

    ### プロセス「vstack1」と「vstack2」が平行して動かない ###
    ### vstack1.start()ではなく下記プロセス設定時に処理が始まっているようです ###
    ### 「stack.fl_vstack1()」の「()」を削除して実行すると処理してくれません ###
    start1 = time.time()
    vstack1 = mp.Process(target=stack.fl_vstack1())
    print('stack1:',time.time() - start1)
    vstack2 = mp.Process(target=stack.fl_vstack2())
    print('stack2:',time.time() - start1)


    vstack1.start()
    vstack2.start()

    vstack1.join()
    vstack2.join()

    tvstack1 = mp.Process(target=stack.sum_stack())
    tvstack2 = mp.Process(target=stack.sum_time())
    tvstack1.start()
    tvstack2.start()
    tvstack1.join()
    tvstack2.join()


    tvstack3 = mp.Process(target=stack.sdplay())
    tvstack3.start()

Answer: Answer:

As the questioner said, in the current code, the fl_vstack1 function is called on the following line.

vstack1 = mp.Process(target=stack.fl_vstack1())

In this line, stack.fl_vstack1() is assigned to the argument target , but since this is the syntax of the function call, the function call occurs and the return value is assigned to target . What I really wanted to assign was the function fl_vstack1 itself.

Instead, specify args as a zero-length tuple, as shown below. In other words, the function itself is specified for target , and the argument is specified for args . This usage is described in the multiprocessing.Process documentation .

vstack1 = mp.Process(target=stack.fl_vstack1, args=())

It may be difficult to understand that this works if it is a long source code in the question text. We have prepared a sample code that allows you to check only the operation of this part, so please use it as needed.

import multiprocessing as mp
import time

def f():
    time.sleep(5)

if __name__ == '__main__':
    process1 = mp.Process(target=f, args=())
    process2 = mp.Process(target=f, args=())

    # 並列に実行されているなら、"Start!" から "Finish!" まで大体 5 秒のはずです。
    print("Start!")
    process1.start()
    process2.start()

    process1.join()
    process2.join()
    print("Finish!")
Scroll to Top