UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence

Problem

When I open the file in Python, the error “UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence” occurred.

The code is as below.

path = 'wordpress/wp-admin/about.php'

with open(path) as f:
            lines = f.readlines()
            print(lines)

Environment

Windows 10
Python 3.8.6

The cause of this problem

The character code ‘cp932’ is Microsoft Cade page 932(Shift_JIS). And ‘0x99’ is the number of 1byte represented by hexadecimal. This error occurs when the character code used for decoding the file does not match the character code of the file.

Solution

Solution 1. Use encoding option argument

If you know the coding of the file, you can specify it with option argument of open() function.

with open(path, encoding = "utf_8") as f:
    lines = f.readlines()
    print(lines)

Solution 2. Use try and except

If you’d like to use a few character coding and ignore files of other character coding.

try:
    with open(path, encoding = "shift_jis") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "ascii") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "utf_8") as f:
        lines = f.readlines()
        print(lines)
except:
    pass

Solution 3. Use chardet

Chardet is a module to detect the character encoding.

Install chardet with pip command.

> pip install chardet

Installing collected packages: chardet
....
Successfully installed chardet-3.0.4

You can get the character encoding by chardet.dect(<binary data>).

import chardet
path = 'wordpress/wp-admin/about.php'

with open(path, mode='rb') as f:
    binary = f.read()
    code = chardet.detect(binary)['encoding']
with open(path, encoding=code) as f:
    lines = f.readlines()
    print(lines)

Solution 4. Ignore errors by codec

You can ignore errors by using codec.open() and setting the option argument ‘errors’ ‘ignore’.

import codec
path = 'wordpress/wp-admin/about.php'

with codecs.open(path, 'r', 'utf-8', 'ignore') as f:
    lines = f.readlines()
    print(lines)