Recent Posts

Search all files with Python

GOAL

To trace all files in the directory with Python

Environment

WIndows 10
Python 3.8.6

Method

Use glob.glob(path +’/**’, recursive=True)

import glob
import os

path = "C:\\path\\direcotry"

files = glob.glob(path +'/**', recursive=True)
for file_item in files:
    print(file_item)

* If you don’t add recursive=True, only the directory defined as “path” is searched.

UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence

Problem

When I open the file in Python, the error “UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence” occurred.

The code is as below.

path = 'wordpress/wp-admin/about.php'

with open(path) as f:
            lines = f.readlines()
            print(lines)

Environment

Windows 10
Python 3.8.6

The cause of this problem

The character code ‘cp932’ is Microsoft Cade page 932(Shift_JIS). And ‘0x99’ is the number of 1byte represented by hexadecimal. This error occurs when the character code used for decoding the file does not match the character code of the file.

Solution

Solution 1. Use encoding option argument

If you know the coding of the file, you can specify it with option argument of open() function.

with open(path, encoding = "utf_8") as f:
    lines = f.readlines()
    print(lines)

Solution 2. Use try and except

If you’d like to use a few character coding and ignore files of other character coding.

try:
    with open(path, encoding = "shift_jis") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "ascii") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "utf_8") as f:
        lines = f.readlines()
        print(lines)
except:
    pass

Solution 3. Use chardet

Chardet is a module to detect the character encoding.

Install chardet with pip command.

> pip install chardet

Installing collected packages: chardet
....
Successfully installed chardet-3.0.4

You can get the character encoding by chardet.dect(<binary data>).

import chardet
path = 'wordpress/wp-admin/about.php'

with open(path, mode='rb') as f:
    binary = f.read()
    code = chardet.detect(binary)['encoding']
with open(path, encoding=code) as f:
    lines = f.readlines()
    print(lines)

Solution 4. Ignore errors by codec

You can ignore errors by using codec.open() and setting the option argument ‘errors’ ‘ignore’.

import codec
path = 'wordpress/wp-admin/about.php'

with codecs.open(path, 'r', 'utf-8', 'ignore') as f:
    lines = f.readlines()
    print(lines)

Why does category list return “Uncategorized”?

Problem

I’d like to display the list of all categories in my theme, but it return “Uncategorized”. The PHP code in index,php is as below.

<h1>Categories</h1>
<?php
	$categories = get_the_category();
	if ( $categories ) {
		foreach( $categories as $category ) {
			$output .= '<a href="' . get_category_link($category->term_id) 
			. '">' . $category->cat_name . '</a>' . ' ';
		}
		echo $output;
	}
?>

The result is the following.

The cause of this problem

get_the_category(); is the function to get the categories of “current” post or page. This page is index.php with no category, so it return “Uncategorized”.

Solution

Use get_categories(); instead of get_the_category();.

<h1>Categories</h1>
<?php
	$categories = get_categories();
	if ( $categories ) {
		foreach( $categories as $category ) {
			$output .= '<a href="' . get_category_link($category->term_id) 
				. '">' . $category->cat_name . '</a>' . ' ';
		}
	echo $output;
	}
?>

This is the result.

Categories

AfterEffects Algorithm Artificial Intelligence Blender C++ Computer Graphics Computer Science Daily Life DataAnalytics Event Game ImageProcessing JavaScript Kotlin mathematics Maya PHP Python SoftwareEngineering Tips Today's paper Tools TroubleShooting Unity Visual Sudio Web Windows WordPress 未分類