Category: Computer Science

Bucket sort, counting sort and radix sort.

2020-04-07 Algorithm Computer Science

GOAL

To understand the difference and pros and cons of “bucket sort”, “counting sort” and “radix sort” and implement them.

Bucket sort

Bucket sort is one of the sorting algorithm that has “bucket” to put numbers in.

import random
import itertools
length = 50 #length of the list
input_list = [random.randint(0, 99) for x in range(length)]

#input: list that has 0~99 int elements
#output: sorted list
def bucket_sort(input_list):
    bucket = [[] for x in range(100)]
    for i in input_list:
        bucket[i].append(i)
    return list(itertools.chain.from_iterable(bucket))

print(input_list)
print(bucket_sort(input_list))

#output
[6, 27, 26, 38, 90, 80, 69, 14, 65, 53]
[6, 14, 26, 27, 38, 53, 65, 69, 80, 90]

Complexity: time O(n), space O(r) as n is length of the list, r is range. In the worst case, space complexity is O(r+n)
Pros: Fast algorithm, stable sort
Cons: big memory, the range of input number should be limited.

Counting sort

Counting sort is one of the derivation of bucket sort. Create Bucket and put the number of “occurrences” in it. Iterate the input list counting the number of occurrences.

import random
length = 10
input_list = [random.randint(0, 99) for x in range(length)]

def counting_sort(input_list):
    bucket = [0 for x in range(100)]
    for i in input_list:
        bucket[i] += 1
    output = []
    for idx, num in enumerate(bucket):
        for i in range(num):
            output.append(idx)
    return output

print(input_list)
print(counting_sort(input_list))

#output
[84, 33, 72, 10, 31, 4, 4, 46, 89, 52]
[4, 4, 10, 31, 33, 46, 52, 72, 84, 89]

Complexity: time O(n), space O(r) as n is length of the list, r is range.
Pros: Fast algorithm, fixed size memory
Cons: unstable, big memory, the range of input number should be limited.

Radix sort

Radix sort is one of the derivation of bucket sort. Convert each elements into n-adic number and put each digit into the bucket.

import random
import itertools
length = 10
input_list = [random.randint(0, 99) for x in range(length)]

def radix_sort(input_list): # n = 10
    for i in (range(2)): #digit is 2
        bucket= [[] for i in range(10)]
        for num in input_list:
            index = (num//(10**i)) % 10
            bucket[index].append(num)
        input_list = list(itertools.chain.from_iterable(bucket)).copy()
    return input_list

print(input_list)
print(radix_sort(input_list))

#output
[26, 4, 7, 48, 71, 31, 95, 20, 94, 55]
[4, 7, 20, 26, 31, 48, 55, 71, 94, 95]

Complexity: time O(n), space O(d) as n is length of the list, d is number of digits. In the worst case, space complexity is O(d+n)
Pros: Fast algorithm, small memory, stable sort
Cons: the range of input number should be limited

Because it takes some time to calculate division and modulo, using shift operation of binary numbers is efficient.

The difference between Unix and Linux.

2020-04-05 Computer Science

When I asked, it was difficult for me to explain the difference of them strictly.

GOAL

To understand the difference and history of Unix and Linux.

What is Unix and Linux

They are some of OS.

	Publish	Interface	Specification	language
Unix	AT&T(1968)	CLI (&GUI, X-windows)	Single UNIX Specification	C, Assembly language	mac OS
Linux	University of Helsinki　(1991)	GUI (& CLI)	Linux Standard Base Specifications	C, Assembly language	Open source, Unix-like, used in Android
Windows	Microsoft　(1985)	CLI (& GUI)		C, C++, Assembly language

Difference of Unix and Linux

Unix is a general term to express operating system that meets some conditions(Single UNIX Specification).

Linux is one of the Unix-like operating system. It is developed taking after Unix, but it is different OS. Linux has its source code and original kernel called Linux kernel.

Linux is open source in contrast to Unix. This is the

Similarities of Unix and Linux

Linux is UNIX compatible operating system that meets POSIX. So they have similar interfaces. POSIX, Portable operating system interface, is API standard that define application interface programming of Unix and Unix-like OS. OS that meet POSIX have similar system calls, formats, structures of directories and files, processes and databases.

By combining Linux kernel and GNU software, a complete UNIX compatible operating system is available as free software.

Internal Representation of Numbers in Computer.

2020-04-03 Computer Science DataAnalytics

I found the following phenomena but why?

#source code
a = 1.2
b = 2.4
c = 3.6
print(a + b == c)

False

GOAL

To understand the mechanism of the internal representation of integer and float point in computer.

Binary Number

Numbers are represented as binary number in computer.

Unsigned int

Signed int: Two’s Complement notation

What is complement?

Complement has two definition.
First, complement on n of a given number can be defined as the smallest number the sum of which and the given number increase its digit. Second, it is also defined as the biggest number the sum of which and the given number doesn’t increase its digit. In decimal number, the 10’s complement of 3 is 7 and the 9’s complement of 3 is 6. How about in binary number?

One’s complement

One’s complement of the input is the number generated by reversing each digit of the input from 0 to 1 and from 1 to 0.

Two’s complement

Two’s complement of the input is the number generated by reversing each digit of the input, that is one’s complement, plus 1. Or it can be calculated easily by subtracting 1 from the input and reverse each digit from 1 to 0 and from 0 to 1.

The range that can be expressed in two’s complement notation

The range is asymmetric.

Floating point

The following is the way to represent floating point number in computer.

The digit is limited and this means that the floating point can’t represent all the decimals in continuous range. Then the decimals are approximated to closest numbers.

Why float(1.2)＋float(2.4) is not equal to float(3.6)?

In computers float number can’t have exactly 1.1 but just an approximation that can be expressed in binary. You can see errors by setting the number of decimal places bigger.

#source code
a = 1.2
b = 2.4
c = 3.6
print('{:.20f}'.format(a))
print('{:.20f}'.format(b))
print('{:.20f}'.format(c))
print('{:.20f}'.format(a+b))

1.19999999999999995559
2.39999999999999991118
3.60000000000000008882
3.59999999999999964473

*You can avoid this phenomena by using Decimal number in python.

from decimal import *
a = Decimal('1.2')
b = Decimal('2.4')
c = Decimal('3.6')
print(a+b == c)

print('{:.20f}'.format(a))
print('{:.20f}'.format(b))
print('{:.20f}'.format(c))
print('{:.20f}'.format(a+b))

True
1.20000000000000000000
2.40000000000000000000
3.60000000000000000000
3.60000000000000000000

Numerical Error in program

2020-04-01 Computer Science

GOAL

To understand numerical errors in computers.

Error of input data

Input data error occurs when real number such as infinite decimal numbers are input as finite decimal number by users.

Approximation error

For example, square root are generated by approximate calculation such as iterative method and bisection method in computers.

Round-off error

Round-off error is caused because the calculation result is rounded to within the number of significant digits in computers.

1.105982 -> 1.10598
40156.245618 -> 40156.24562

Truncation error

Truncation error occurs when iterative calculation such as infinite series or iterative method aborted halfway because of time restriction.

Loss of significance

Loss of significance occurs in calculations of two close numbers using finite-precision arithmetic.

Example

1.222222- 1.222221 = 0.000001
(significant digits 7) – (significant digits 7) = (significant digits 1)

Why loss of significance is undesirable?

For example, calculate √1000 – √999 to 7 significant digit.

Information loss

Information loss occurs when add or subtract big number and small number.

Example

122.2222 + 0.001234567 = 122.2234 (significant digits 7)