How to Run Your Python Code Concurrently Using Threads

Originally posted in makeuseof.

Threading significantly reduces the execution time of a program. Learn how to implement threading in Python.

Execution time is one of the common measures of the efficiency of a program. The faster the execution time the better the program. Threading is a technique that allows a program to perform multiple tasks or processes simultaneously.

You will learn how to use the Python built-in threading module and the concurrent.features module. Both of these modules offer simple ways to create and manage threads

Importance of Threading

Threading reduces the amount of time a program takes to complete a job. If the job contains multiple independent tasks, you can use threading to run the tasks concurrently, reducing the program’s wait time for one task to finish before moving on to the next.

For example, a program that downloads multiple image files from the internet. This program can utilize threading to download the files in parallel rather than one at a time. This eliminates the time the program would have to wait for the download process of one file to complete before moving on to the next one.

Initial Program Before Threading

The function in the following program represents a task. The task is to pause the execution of the program for one second. The program calls the function twice hence creating two tasks. It then calculates the time it took for the whole program to run and then displays it on the screen.

import time

start_time = time.perf_counter()

def pause():
    print('Sleeping 1 second...')
    time.sleep(1)
    print('Done Sleeping...')

pause()
pause()
finish_time = time.perf_counter()
print(f'Finished in {round(finish_time - start_time, 2)} second(s)')

The output shows the program took 2.01 seconds to execute. Each task took one second and the rest of the code took 0.01 seconds to execute.

output of a program showing time taken to run the program

You can use threading to concurrently execute both tasks. This will take both tasks one second to execute.

Implementing Threading Using the threading Module

To modify the initial code to implement threading, import the threading module. Create two threads, thread_1 and thread_2 using the Thread class. Call the start method on each thread to start its execution. Call the join method on each thread to wait for their execution to complete before the rest of the program executes.

import time
import threading
start_time = time.perf_counter()

def pause():
    print('Sleeping 1 second...')
    time.sleep(1)
    print('Done Sleeping...')

thread_1 = threading.Thread(target=pause)
thread_2 = threading.Thread(target=pause)

thread_1.start()
thread_2.start()

thread_1.join()
thread_2.join()

finish_time = time.perf_counter()
print(f'Finished in {round(finish_time - start_time, 2)} second(s)')

The program will run both threads concurrently. This will reduce the amount of time it takes to accomplish both tasks.

output of a program showing the time taken to run a program

The output shows the time taken to run the same tasks is around a second. This is half the time the initial program took.

Implementing Threading Using the concurrent.futures Module

Python 3.2 saw the introduction of the concurrent.futures module. This module provides a high-level interface for executing asynchronous tasks using threads. It provides a simpler way of executing tasks in parallel.

To modify the initial program to use threading, import the concurrent.features module. Use the ThreadPoolExecutor class from the concurrent.futures module to create a pool of threads. Submit the pause function to the pool twice. The submit method returns a future object that represents the result of the function call.

Iterate over the futures and print their results using the result method.

import time
import concurrent.futures

start_time = time.perf_counter()

def pause():
    print('Sleeping 1 second...')
    time.sleep(1)
    return 'Done Sleeping...'

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = [executor.submit(pause) for _ in range(2)]
    for f in concurrent.futures.as_completed(results):
        print(f.result())

finish_time = time.perf_counter()

print(f'Finished in {round(finish_time - start_time, 2)} second(s)')

The concurrent.features module takes care of starting and joining the threads for you. This makes your code cleaner.

output of a program showing the time taken to run a program

The output is identical to that of the threading module. The threading module is useful for simple cases where you need to run a few threads in parallel. On the other hand, the concurrent.futures module is useful for more complex cases where you need to run many tasks concurrently.

Using Threading in a Real-World Scenario

Using threads to run the above program reduced the time by one second. In the real world, threads save more time. Create a program that downloads images from the internet. Start by creating a new virtual environment. Run the following command in the terminal to install the requests library:

pip install requests

The requests library will allow you to send HTTP requests. Import the requests library and the time library.

import requests
import time

Create a list of URLs of the images you would like to download. Let them be at least ten so that you can notice a significant difference when you implement threading.

img_urls = [
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
]

Loop over the list of URLs downloading each image into the same folder that contains your project. Display the time taken to download the images by subtracting the finish time from the start time.

start_time = time.perf_counter()
for img_url in img_urls:
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')
finish_time = time.perf_counter()
print(f'Finished in {finish_time - start_time} seconds')

The program takes around 22 seconds to download the 12 images. It may vary for you as the time taken to download the images also depends on the speed of your internet.

output of a program showing time taken to download images

Modify the program to use threading using the concurrent.features module. Instead of a loop, use a function. This is the function you will pass to the executor instance.

import requests
import time
import concurrent.futures

img_urls = [
    'https://images.unsplash.com/photo-1524429656589-6633a470097c',
    'https://images.unsplash.com/photo-1530224264768-7ff8c1789d79',
    'https://images.unsplash.com/photo-1564135624576-c5c88640f235',
    'https://images.unsplash.com/photo-1541698444083-023c97d3f4b6',
    'https://images.unsplash.com/photo-1522364723953-452d3431c267',
    'https://images.unsplash.com/photo-1513938709626-033611b8cc03',
    'https://images.unsplash.com/photo-1507143550189-fed454f93097',
    'https://images.unsplash.com/photo-1493976040374-85c8e12f0c0e',
    'https://images.unsplash.com/photo-1504198453319-5ce911bafcde',
    'https://images.unsplash.com/photo-1530122037265-a5f1f91d3b99',
    'https://images.unsplash.com/photo-1516972810927-80185027ca84',
    'https://images.unsplash.com/photo-1550439062-609e1531270e',
]

start_time = time.perf_counter()


def download_image(img_url):
    img_bytes = requests.get(img_url).content
    img_name = img_url.split('/')[3]
    img_name = f'{img_name}.jpg'
    with open(img_name, 'wb') as img_file:
        img_file.write(img_bytes)
        print(f'{img_name} was downloaded...')


with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(download_image, img_urls)


finish_time = time.perf_counter()

print(f'Finished in {finish_time-start_time} seconds')

After introducing threading. The time reduces significantly. It took only 4 seconds to complete the execution of the program.

output of a program showing time taken to download images

Scenarios Suitable for Threading

Some of the scenarios suitable for threading are:

  • I/O bound tasks: If the program spends the majority of the time waiting for input or output operations to complete. Threading can improve performance by allowing other tasks to execute while waiting for I/O operations to complete.
  • Web scrapingWeb scraping involves making HTTP requests and parsing HTML responses. Threading helps speed up the process by allowing you to make multiple requests simultaneously.
  • CPU-bound tasks: Threading can help to improve performance by allowing multiple tasks to execute in parallel.

Familiarize Yourself With Threading in Other Languages

Python is not the only language that supports threading. Most programming languages support some form of threading. It is important to familiarize yourself with the implementation of threads in other languages. This equips you with the necessary skills to tackle different scenarios where threading may apply.

Source: makeuseof