The Python GIL
Table of Contents
What is the GIL? #
CPython, the prevailing implementation of Python, has a construct called the Global Interpreter Lock (GIL).
This mostly affects Python memory management, which uses reference counting for objects.
The GIL is similar to other concurrency controls like mutexes. However, instead of locking accesses at a fine-grained level, it allows only one OS thread in the process to even execute Python bytecode at a given time.
What makes it so infamous? #
Python has been around for a while–since the late 1990s. Back then, there weren’t many multi-core systems. This meant that the GIL was a fairly straightforward way to make memory management safe, with almost no downsides!
Now that multi-core systems are common and more programs are being written to be able to run in a parallelizable way, the GIL poses a hurdle. This affects lots of programs for machine learning, so it’s especially felt in today’s world.
A quick example #
To get a slighly more concrete look at the GIL’s effects in practice, here’s a basic example.
from threading import Thread
from time import perf_counter
def sample_function():
for _ in range(50000000):
pass
def main_parallel():
start_time = perf_counter()
thread_1 = Thread(target=sample_function)
thread_1.start()
thread_2 = Thread(target=sample_function)
thread_2.start()
thread_1.join()
thread_2.join()
end_time = perf_counter()
print(f"Parallel finished in {end_time - start_time} seconds")
def main_sequential():
start_time = perf_counter()
sample_function()
sample_function()
end_time = perf_counter()
print(f"Sequential finished in {end_time - start_time} seconds")
def main_singular():
start_time = perf_counter()
sample_function()
end_time = perf_counter()
print(f"Singular finished in {end_time - start_time} seconds")
if __name__ == "__main__":
main_parallel()
main_sequential()
main_singular()
Let’s look at the output of this script (results may vary slightly depending on the system) on a multi-core system. Running the operations in parallel is pretty much like running them one after another, but we would’ve expected the parallel version to be faster. That’s the GIL in action!
$ python3 script.py
Parallel finished in 1.004378750003525 seconds
Sequential finished in 0.9990947500045877 seconds
Singular finished in 0.499286292004399 seconds
Is there a way around it? #
Currently the main way to work around the GIL is to simple parallelize code at the process level instead of the thread level.
This can be done in a few ways:
- Launch several Python processes on a single machine
- Run several containers (Docker, Kubernetes, etc.) with one process per container
- Run several machines with one process per machine (would not recommend since this is resource-intensive)
Note that all of these options need a coordination layer to divide the work being done and to collect the results.
Here’s a rudimentary example of the first option (running on a single machine). It uses Python’s multiprocessing module.
import multiprocessing
def example_function(result_queue, limit):
result = 0
for i in range(limit):
result += i
result_queue.put(result)
if __name__ == "__main__":
# NOTE: May want to adjust this if you don't want to use all CPU cores
num_cpus = multiprocessing.cpu_count()
result_queue = multiprocessing.Queue()
processes = []
for i in range(num_cpus):
p = multiprocessing.Process(
target=example_function, args=(result_queue, 3000 * i)
)
processes.append(p)
p.start()
for p in processes:
p.join()
while not result_queue.empty():
print(f"Process result: {result_queue.get()}")
What’s the future for the GIL? #
There’s a proposal in PEP 703 to let CPython run without the GIL. It’s a long document, and there are a lot of prereqs to implement it.
The core problem of thread-safety the GIL solves still needs to be solved even without the GIL. The challenge will be to rework the interpreter to be thread-safe, such as updating the reference counting approach and some other memory management changes.
It’ll take some time, but it’s gaining momentum!