September 30, 2020

Python Sorted Containers

Python Sorted Containers

In this article, I would like to talk about Python Sorted Containers. In python language, the build-in data collection is implemented differently from another language (e.g., C/C++). That makes python losing some ability or performance to access specific data. For example, in C++ language, std::map and std::set are implemented by a tree structure, that causes the data collections to have an ability to search upper or lower bound (upper_bound, lower_bound) of the selected value. In python, such abilities don’t exist in the built-in data collection. However, Python has a library to support that ability, which can install additionally. Sorted containers is an open-sourced library that allows us to insert or delete elements very efficiently and the elements maintaining a sorted order. The library contains three containers: SortedList, SortedDict, and SortedSet.

Installation: we can install sorted collection by pip command:

$sudo pip install sortedcontainers 

Python Sorted Container contains three sorted data structure:

  • SortedList 
  • SortedDict 
  • SortedSet

We can use these sorted containers in the same way as build-in python; List/Dict/Set. These are some examples of code to use the Sorted containers.

SortedList

from random import random
from sortedcontainers import SortedList

mainList = SortedList()
for i in range(100):
    mainList.add(random())

SortedDict

from sortedcontainers import SortedDict
from random import random

mainList = SortedDict()

for i in range(100):
    mainList[i]=(random())

SortedSet

from sortedcontainers import SortedSet
from random import random

mainList = SortedSet()

for i in range(100):
    mainList.add(random())

Since the data inside Python Sorted Containers are sorted, the containers can fetch the nearest value (upper bound/lower bound) like upper_bound, lower_bound in c++ language. In Python Sorted Containers, they are called  bisect_left(value) and bisect_right(value). 

bisect_left(value)  return an index to insert value in the sorted container. if the value already exists in the container, the function will return the existing value’s index.

bisect_left(value) return an index to insert value in the sorted container. if the value already exists in the container, the function will return the index after the existing value’s index.

Let’s see an example.

>>> sl = SortedList([10, 12, 14, 16, 18])
>>> sl.bisect_left(12)
1
>>> sl.bisect_left(13)
2
>>> sl.bisect_right(13)
2
>>> sl.bisect_right(14)
3

As we can see from the example, function bisect_left returns an index of the value, if the value exists in the list; sl.bisect_left(12) return 1 because the index of 12 is the 1st. On the other hand, if the value doesn’t exist, the function will return the index of an insertion point of the value; sl.bisect_left(13) return 2 because the index 2 is the new position of 13 if there is an insertion. In a similar way to bisect_right, if the value doesn’t exist, the function will return the index of an insertion position; sl.bisect_right(13) return 2, because it is the insertion position. However, if the value is already in the container, the function will return the index after the existing value; sl.bisect_right(14) returns 3 because 14 is at index 2.

The most advantage of these two functions is the performance. bisect_left and bisect_right can run with O(nlgn) running time, which is optimized for searching application.

reference: http://www.grantjenks.com/docs/sortedcontainers/