UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence
Problem
When I open the file in Python, the error “UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence” occurred.
The code is as below.
path = 'wordpress/wp-admin/about.php' with open(path) as f: lines = f.readlines() print(lines)
Environment
Windows 10
Python 3.8.6
The cause of this problem
The character code ‘cp932’ is Microsoft Cade page 932(Shift_JIS). And ‘0x99’ is the number of 1byte represented by hexadecimal. This error occurs when the character code used for decoding the file does not match the character code of the file.
Solution
Solution 1. Use encoding option argument
If you know the coding of the file, you can specify it with option argument of open() function.
with open(path, encoding = "utf_8") as f: lines = f.readlines() print(lines)
Solution 2. Use try and except
If you’d like to use a few character coding and ignore files of other character coding.
try: with open(path, encoding = "shift_jis") as f: lines = f.readlines() print(lines) except: pass try: with open(path, encoding = "ascii") as f: lines = f.readlines() print(lines) except: pass try: with open(path, encoding = "utf_8") as f: lines = f.readlines() print(lines) except: pass
Solution 3. Use chardet
Chardet is a module to detect the character encoding.
Install chardet with pip command.
> pip install chardet Installing collected packages: chardet .... Successfully installed chardet-3.0.4
You can get the character encoding by chardet.dect(<binary data>).
import chardet path = 'wordpress/wp-admin/about.php' with open(path, mode='rb') as f: binary = f.read() code = chardet.detect(binary)['encoding'] with open(path, encoding=code) as f: lines = f.readlines() print(lines)
Solution 4. Ignore errors by codec
You can ignore errors by using codec.open() and setting the option argument ‘errors’ ‘ignore’.
import codec path = 'wordpress/wp-admin/about.php' with codecs.open(path, 'r', 'utf-8', 'ignore') as f: lines = f.readlines() print(lines)