Category: Algorithm

Binary Search For Duplicate Data

GOAL

The goal of this article it to describe an algorithm to find first target item or last target item from sorted list with duplication using binary search and how to implement them in C++.

What is binary search?

Binary search is one of the search algorithms, that finds position or judges existence of an target value on sorted array.

Binary search compares the current value defined as a half of target range of the array to the target value. Binary search compares the middle element of current target range in the array to the target value. The current target range is narrowed according to the result of comparison as below.

What if there are same values in the array?

If there are same values, the found element can be first one, last one or other. How can we change this algorithm to search for the smallest or largest element?

Find first index

If the value currently focused is equal to the target value, the current value is contained to the next range as a max value. End the process when min index equal to max index, then check if the min value is equal to the target value.

(more…)

Bidirectional Dict In Python

GOAL

Today’s goal is discuss the method of implementing bidirectional dictionary in Python.

Environment

Windows 10
Python 3.8.7

What is bidirectional dictionary?

Dictionary or hash table is a data structure composed of a collection of (key, map) pair where keys are unique, which is known as an associative array, map, symbol table, or dictionary. The advantage of dictionary is its small time complexity O(1).

job_dict = {"John":"director", "Mike":"designer", "Anna":"designer", "Lisa":"engineer"}
# job_dict["Mike"] returns "designer"
Outline image of hash table or dictionary

Bidirectional dictionary I’d like is the dictionary that return the value when the key is given and return keys when the value is given.

# what I'd like to do
job_bidict = {"John":"director", "Mike":"designer", "Anna":"designer", "Lisa":"engineer"}
# job_bidict["Mike"] returns "designer"
# job_bidict["designer"] returns "Mike", "Anna" or ["Mike", "Anna"]

Method

I introduces 3 methods and pros and cons of them.

<!--more-->

Method 1. Use bidict library

Install and import bidict library according to bidict documentation.

pip install bidict
from bidict import bidict
job_bidict = bidict({"John":"director", "Mike":"designer", "Lisa":"engineer"}) # The values should be unique

print(job_bidict["Mike"])
# output => designer
print(job_bidict.inverse["designer"])
# output => Mike
  • Advantage
    • It’s easy to implement.
    • The library has many useful functions
  • Disadvantage
    • The values of bidict should be unique.
    • The file size grows when the program is distributed.

Advantage: It’s easy to implement.
Disadvantage: The file size grows when you distribute the program.

Method 2. Use 2 dicts in a class

class MyBidict:
  def __init__(self, init_dict):
    self.dict_normal = {} # normal dict to store (key, values)
    self.dict_inverse = {} # normal dict to store (value. keys)
    for key in init_dict.keys():
      self.add_item(key, init_dict[key])
  def add_item(self, key, value):
    if key in self.dict_normal:
      self.dict_normal[key].append(value)
    else:
      self.dict_normal[key] = [value]
    if value in self.dict_inverse:
      self.dict_inverse[value].append(key)
    else:
      self.dict_inverse[value] = [key]
  def get_item(self, key):
    return self.dict_normal[key]
  def get_item_inverse(self, value):
    return self.dict_inverse[value] 

job_bidict = MyBidict({"John":"director", "Mike":"designer", "Anna":"designer", "Lisa":"engineer"})

print(job_bidict.get_item("Mike"))
# output => ['designer']
print(job_bidict.get_item_inverse("designer"))
# output => ['Mike', 'Anna']
  • Advantage
    • You can customize the specification of the dict such as whether it allows duplicate values or not.
    • You can add functions you need.
  • Disadvantage
    • It takes time and effort to implement.
    • The specification is sometimes different from the built-in dict.

Method 3. Implement Bidict class as a subclass of dict

class MyBidict(dict):
  def __init__(self, init_dict):
    self.inverse = {}
    dict.__init__(self, init_dict)
    for key, value in init_dict.items():
      self.inverse[value] = key
  def __setitem__(self, key, value):
    dict.__setitem__(self, key, value)
    self.inverse.__setitem__(value, key)

job_bidict = MyBidict({"John":"director", "Mike":"designer", "Anna":"designer", "Lisa":"engineer"})

print(job_bidict["Mike"])
# output => designer 
print(job_bidict.inverse["designer"])
# output => Anna
  • Advantage
    • It’s relatively easy to implement.
    • You can customize the specification of the dict such as whether it allows duplicate values or not.
    • The functions of built-in dict can be still used.
  • Disadvantage
    • It takes a little time and effort to implement.
    • The original function can destroy the specifications of built-in dict.

Union-Find in Python

GOAL

To understand and to implement union-find in Python.

Example of problems using union-find algorithm

Dividing clusters

There is a list of friends from N people. Divide them in cluster.

n = 6
friendship = [(1, 2), (3, 4), (1, 6)]
(more…)

Bucket sort, counting sort and radix sort.

GOAL

To understand the difference and pros and cons of “bucket sort”, “counting sort” and “radix sort” and implement them.

Bucket sort

Bucket sort is one of the sorting algorithm that has “bucket” to put numbers in.

import random
import itertools
length = 50 #length of the list
input_list = [random.randint(0, 99) for x in range(length)]

#input: list that has 0~99 int elements
#output: sorted list
def bucket_sort(input_list):
    bucket = [[] for x in range(100)]
    for i in input_list:
        bucket[i].append(i)
    return list(itertools.chain.from_iterable(bucket))

print(input_list)
print(bucket_sort(input_list))
#output
[6, 27, 26, 38, 90, 80, 69, 14, 65, 53]
[6, 14, 26, 27, 38, 53, 65, 69, 80, 90]

Complexity: time O(n), space O(r) as n is length of the list, r is range. In the worst case, space complexity is O(r+n)
Pros: Fast algorithm, stable sort
Cons: big memory, the range of input number should be limited.

Counting sort

Counting sort is one of the derivation of bucket sort. Create Bucket and put the number of “occurrences” in it. Iterate the input list counting the number of occurrences.

import random
length = 10
input_list = [random.randint(0, 99) for x in range(length)]

def counting_sort(input_list):
    bucket = [0 for x in range(100)]
    for i in input_list:
        bucket[i] += 1
    output = []
    for idx, num in enumerate(bucket):
        for i in range(num):
            output.append(idx)
    return output

print(input_list)
print(counting_sort(input_list))
#output
[84, 33, 72, 10, 31, 4, 4, 46, 89, 52]
[4, 4, 10, 31, 33, 46, 52, 72, 84, 89]

Complexity: time O(n), space O(r) as n is length of the list, r is range.
Pros: Fast algorithm, fixed size memory
Cons: unstable, big memory, the range of input number should be limited.

Radix sort

Radix sort is one of the derivation of bucket sort. Convert each elements into n-adic number and put each digit into the bucket.

import random
import itertools
length = 10
input_list = [random.randint(0, 99) for x in range(length)]

def radix_sort(input_list): # n = 10
    for i in (range(2)): #digit is 2
        bucket= [[] for i in range(10)]
        for num in input_list:
            index = (num//(10**i)) % 10
            bucket[index].append(num)
        input_list = list(itertools.chain.from_iterable(bucket)).copy()
    return input_list

print(input_list)
print(radix_sort(input_list))
#output
[26, 4, 7, 48, 71, 31, 95, 20, 94, 55]
[4, 7, 20, 26, 31, 48, 55, 71, 94, 95]

Complexity: time O(n), space O(d) as n is length of the list, d is number of digits. In the worst case, space complexity is O(d+n)
Pros: Fast algorithm, small memory, stable sort
Cons: the range of input number should be limited

Because it takes some time to calculate division and modulo, using shift operation of binary numbers is efficient.