Originally posted on towardsdatascience.
New features that can simplify your data processing code — what are they and what do they improve?
Python 3.9 has accumulated a lengthy list of improvements with some pretty significant changes such as a new type of parser. The new PEG parser takes a little more memory but is a little faster and should be able to handle certain cases better compared to the LL(1) parser. Although, we’ll probably only start seeing the true effect of that starting from Python 3.10 according to the release documentation. Python 3.9 also expands the usage of some built-in and standard library collection types as generic types without having to resort to the typing
module. This is particularly useful if you use type checkers, linters, or IDEs.
Perhaps, the two features that will come in most handy in everyday work with textual data are the union operators added to dict
and the new string
methods.
Union operators for dict
Two union operators, merge |
and update |=
, have been introduced for dict
.
So if you want to create a new dict
based on two dictionaries you already have, you can do the following:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> d3 = d1 | d2
>>> d3
{'a': 1, 'b': 2}
If you want to augment one of those two dictionaries using the other, you can do it like so:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> d1 |= d2
>>> d1
{'a': 1, 'b': 2}
There were already a few ways to merge dictionaries in Python. It could be done using dict.update()
:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> d1.update(d2)
>>> d1
{'a': 1, 'b': 2}
However, dict.update()
modifies the dict
you re updating in-place. If you want to have a new dictionary, you would have to make a copy of one of your existing dictionaries first:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> d3 = d1.copy()
>>> d3
{'a': 1}
>>> d3.update(d2)
>>> d3
{'a': 1, 'b': 2}
In addition to that, there were 3 more approaches to accomplish that before Python 3.9. Merging two dictionaries can also be achieved with the {**d1, **d2}
construct:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> {**d1, **d2}
{'a': 1, 'b': 2}
However, if you were using a subclass of dict
, for example, defaultdict
, this construct will resolve your new dictionary’s type to just dict
:
>>> d1
defaultdict(None, {0: 'a'})
>>> d2
defaultdict(None, {1: 'b'})
>>> {**d1, **d2}
{0: 'a', 1: 'b'}
This wouldn’t happen with the union operators:
>>> d1
defaultdict(None, {0: 'a'})
>>> d2
defaultdict(None, {1: 'b'})
>>> d1 | d2
defaultdict(None, {0: 'a', 1: 'b'})
Another way to merge 2 dictionaries into a new one that was available before Python 3.9 is dict(d1, **d2)
:
>>> d1 = {'a': 1}
>>> d2 = {'b': 2}
>>> dict(d1, **d2)
{'a': 1, 'b': 2}
However, this only works for dictionaries that have all keys of type string
:
>>> d1 = {'a': 1}
>>> d2 = {2: 'b'}
>>> dict(d1, **d2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings
The third option is using collections.ChainMap
. It probably is even less straightforward than the previous two methods and unfortunately modifies the underlying dictionaries if you update the ChainMap object:
>>> d1 = {'a': 1, 'b': 2}
>>> d2 = {'b': 3}
>>> from collections import ChainMap
>>> d3 = ChainMap(d1, d2)
>>> d3
ChainMap({'a': 1, 'b': 2}, {'b': 3})
>>> d3['b'] = 4
>>> d3
ChainMap({'a': 1, 'b': 4}, {'b': 3})
>>> d1
{'a': 1, 'b': 4}
>>> d2
{'b': 3}
The new merge and update union operators for dict
address those issues and appear to be less cumbersome. They may be confused with the bitwise “or” and some other operators, though.
String methods to remove prefixes and suffixes
As the PEP-616 documentation suggests, pretty much any time you utilized the str.startswith()
and str.endswith()
methods, you can now simplify the code as follows.
Before, if you were normalizing a list of, for example, movie titles and getting rid of any extra markup like dashes from unordered lists, you could do something like this:
>>> normalized_titles = []
>>> my_prefixed_string = '- Shark Bait'
>>> if my_prefixed_string.startswith('- '):
... normalized_titles.append(my_prefixed_string.replace('- ', ''))
>>> print(normalized_titles)
['Shark Bait']
However, using str.replace()
could yield unwanted side-effects if, for example, a dash was part of the movie title that you would actually rather keep, like in the example below. In Python 3.9, you can do it safely like so:
>>> normalized_titles = []
>>> my_prefixed_string = '- Niko - The Flight Before Christmas'
>>> if my_prefixed_string.startswith('- '):
... normalized_titles.append(my_prefixed_string.removeprefix('- '))
>>> print(normalized_titles)
['Niko - The Flight Before Christmas']
The PEP documentation states that users reported expecting this kind of behavior from another pair of built-in methods, str.lstrip
and str.rstrip
, and often having to implement their own methods to achieve this behavior. These new methods should provide that functionality in a more robust and efficient way. The user won’t have to take care of cases with empty strings or to employ the str.replace()
.
Another typical use-case would be the following:
>>> my_affixed_string = '"Some title"'
>>> if my_affixed_string.startswith('"'):
... my_affixed_string = my_affixed_string[1:]
...
>>> if my_affixed_string.endswith('"'):
... my_affixed_string = my_affixed_string[:-1]
...
>>> my_affixed_string
'Some title'
In Python 3.9, this can be reduced to just:
>>> my_affixed_string = '"Some title"'
>>> my_affixed_string.removeprefix('"').removesuffix('"')
'Some title'
These methods have also been announced for bytes
, bytearray
, and collections.UserString
.
Python 3.9 has also an array of interesting module improvements and optimizations lined up — be sure to check out those as well and figure out what might be most useful for your kinds of applications. Enjoy the new features!
Source: towardsdatascience