Parallelising your Python Code

Parallelising your Python Code

Swirl into a game-existencewhere you and your friend are miners. Yep, the yellow hat and shovel in hand. The task — Dig up a hole within a certain time to be able to extract some sparkly diamonds! So your friend suggests that you and they take turns digging…

Let’s say it takes you 100 minutes to finish this task when done turn-wise.

Illustration designed by Macrovector / Freepik

But, What if you both could hustle together and that would get the job done simultaneously-

Illustration designed by Macrovector / Freepik

It would result in –

  • Less time
  • More efficiency
  • No wasting of available resources
  • Less sweat!

Same happens for your computers, sequentially giving CPU time to 2 processes will basically make the process-in-waiting starve till given a CPU cycle, which is a waste of possible to-use resources.

This post is about how to overcome that.

Let’s Dig in to Parallel programming!

Parallel computing/programming is essentially the use of ≥ 2 processors/cores/computers in combination to solve a single problem. It is a type of computing architecture in which several processors compute simultaneously by dividing the workload between processors.

How to have your code to work parallelly for you!

In python, we have a *GIL-led* monster — Literally called the ‘GIL’ {which stands for Global Interpreter Lock }.

  • It is — a mutex { MUTual EXclusion } that protects access to Python objects simultaneously by the same thread.
  • It also prevents multiple threads from executing Python bytecodes at once.

‘Multiprocessing’ → Our rescuer from GIL!

The multiprocessing package in python is similar to the threading module; The ‘Pool’ class is an epic example that offers really convenient ways to parallelise the execution of a function across various input values.

The Pool class will allow us to create a pool of processes/workers which can handle tasks given to them simultaneously. For example, a function is taken- let’s say we distribute the calls of that function across all/some of the cores/processors we have available and each of the processors will be responsible for some part of the subset of problems we throw at it, and we can have another function which will run all of these things serially and finally compare how multiprocessing is helpful in processing big chunks of IO intensive computation.

> To check the number of CPU cores you can use the following:

import multiprocessing
num_of_cpu = multiprocessing.cpu_count()
print num_of_cpu

Now Let’s get cod-in

import time
def fibonacci_sequence_of(num):
	first_number = 0
	second_number = 1
	num = int(num)
	if num == 0:
		print ("Fibonacci of {} is {}".format(num,num))
	elif num ==1:
		print ("Fibonacci of {} is {}".format(num,num))
	else:
		for i in range(2,num): 
			new_number = first_number + second_number 
			first_number = second_number 
			second_number = new_number
		print ("Fibonacci of {} is {}".format(num,num))

if __name__ == '__main__':

	input_number = input("Provide comma-seperated-values for multiple values \nFabonacci of : ") 
	input_values=[]
	input_values = input_number.split(",")
	toc = time.time()
	for i in input_values:
		fibonacci_sequence_of(i)
	tic = time.time()
	time_taken=round((tic-toc)*1000, 1)
	print ("It takes {} milli-seconds to calculate the fibonacci of {} concurrently".format(time_taken,input_number))

Simply using a for-loop to loop through all the values given by the user.

#importing libraries-
import time
from multiprocessing import Pool

def fibonacci_sequence_of(num):
	first_number = 0
	second_number = 1
  # Need to convert raw user input into integer for computation-
	num = int(num)
	if num == 0:
		print ("Fibonacci of {} is {}".format(num,num))
	elif num ==1:
		print ("Fibonacci of {} is {}".format(num,num))
	else:
		for i in range(2,num): 
			new_number = first_number + second_number 
			first_number = second_number 
			second_number = new_number
		print ("Fibonacci of {} is {}".format(num,second_number))

if __name__ == '__main__':
	input_number = input("Provide comma-seperated-values for multiple values \nFabonacci of : ") 
	input_values=[]
	input_values = input_number.split(",")
	toc = time.time()
  #Making a pool object-
	pool = Pool()
  #Providing numerical values in parellel for computation using .map function
  #.map is a function that is gonna take a function and a list of something(numbes here) in interval and is going to map all those into the processors of our machine
	result = pool.map(fibonacci_sequence_of, input_values)
	tic = time.time()
  
	time_taken=round((tic-toc)*1000, 1)
	print ("It takes {} milli-seconds to calculate the fibonacci of {} in parellel ".format(time_taken,input_number))
  #Waiting for this process to finish running then close-
	pool.close()
	pool.join()

In this following gist, we see that it is possible to simply pass the same function into the ‘.map’ method that makes it all an easy-peasy-cake-walk!

Note – in ‘Pool()’ an optional argument lets us define a number of processors we want our program to use, default is the maximum.

Yet another Note— To check out the number of CPUs available using this little code

There are other awesome methods to map, check out the official documentation of multiprocessing package in python.

Yet another shameless Note – You must’ve noticed when trying out the code above, non-parallel execution gives better results for small number of input whereas, for a large number of input, parallel is the way to go!

Do test this out and comment — at what number of values does parallelism perform better than concurrency!

Hope this was an interesting read enough for you to dig-intoto gold-mine of multiprocessing!!

Source: towardsdatascience