Recent Posts

Search all files with Python

GOAL

To trace all files in the directory with Python

Environment

WIndows 10
Python 3.8.6

Method

Use glob.glob(path +’/**’, recursive=True)

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import glob
import os
path = "C:\\path\\direcotry"
files = glob.glob(path +'/**', recursive=True)
for file_item in files:
print(file_item)
import glob import os path = "C:\\path\\direcotry" files = glob.glob(path +'/**', recursive=True) for file_item in files: print(file_item)
import glob
import os

path = "C:\\path\\direcotry"

files = glob.glob(path +'/**', recursive=True)
for file_item in files:
    print(file_item)

* If you don’t add recursive=True, only the directory defined as “path” is searched.

UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence

Problem

When I open the file in Python, the error “UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x99 in position ~ : illegal multibyte sequence” occurred.

The code is as below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
path = 'wordpress/wp-admin/about.php'
with open(path) as f:
lines = f.readlines()
print(lines)
path = 'wordpress/wp-admin/about.php' with open(path) as f: lines = f.readlines() print(lines)
path = 'wordpress/wp-admin/about.php'

with open(path) as f:
            lines = f.readlines()
            print(lines)

Environment

Windows 10
Python 3.8.6

The cause of this problem

The character code ‘cp932’ is Microsoft Cade page 932(Shift_JIS). And ‘0x99’ is the number of 1byte represented by hexadecimal. This error occurs when the character code used for decoding the file does not match the character code of the file.

Solution

Solution 1. Use encoding option argument

If you know the coding of the file, you can specify it with option argument of open() function.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
with open(path, encoding = "utf_8") as f:
lines = f.readlines()
print(lines)
with open(path, encoding = "utf_8") as f: lines = f.readlines() print(lines)
with open(path, encoding = "utf_8") as f:
    lines = f.readlines()
    print(lines)

Solution 2. Use try and except

If you’d like to use a few character coding and ignore files of other character coding.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
try:
with open(path, encoding = "shift_jis") as f:
lines = f.readlines()
print(lines)
except:
pass
try:
with open(path, encoding = "ascii") as f:
lines = f.readlines()
print(lines)
except:
pass
try:
with open(path, encoding = "utf_8") as f:
lines = f.readlines()
print(lines)
except:
pass
try: with open(path, encoding = "shift_jis") as f: lines = f.readlines() print(lines) except: pass try: with open(path, encoding = "ascii") as f: lines = f.readlines() print(lines) except: pass try: with open(path, encoding = "utf_8") as f: lines = f.readlines() print(lines) except: pass
try:
    with open(path, encoding = "shift_jis") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "ascii") as f:
        lines = f.readlines()
        print(lines)
except:
    pass
try:
    with open(path, encoding = "utf_8") as f:
        lines = f.readlines()
        print(lines)
except:
    pass

Solution 3. Use chardet

Chardet is a module to detect the character encoding.

Install chardet with pip command.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
> pip install chardet
Installing collected packages: chardet
....
Successfully installed chardet-3.0.4
> pip install chardet Installing collected packages: chardet .... Successfully installed chardet-3.0.4
> pip install chardet

Installing collected packages: chardet
....
Successfully installed chardet-3.0.4

You can get the character encoding by chardet.dect(<binary data>).

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import chardet
path = 'wordpress/wp-admin/about.php'
with open(path, mode='rb') as f:
binary = f.read()
code = chardet.detect(binary)['encoding']
with open(path, encoding=code) as f:
lines = f.readlines()
print(lines)
import chardet path = 'wordpress/wp-admin/about.php' with open(path, mode='rb') as f: binary = f.read() code = chardet.detect(binary)['encoding'] with open(path, encoding=code) as f: lines = f.readlines() print(lines)
import chardet
path = 'wordpress/wp-admin/about.php'

with open(path, mode='rb') as f:
    binary = f.read()
    code = chardet.detect(binary)['encoding']
with open(path, encoding=code) as f:
    lines = f.readlines()
    print(lines)

Solution 4. Ignore errors by codec

You can ignore errors by using codec.open() and setting the option argument ‘errors’ ‘ignore’.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
import codec
path = 'wordpress/wp-admin/about.php'
with codecs.open(path, 'r', 'utf-8', 'ignore') as f:
lines = f.readlines()
print(lines)
import codec path = 'wordpress/wp-admin/about.php' with codecs.open(path, 'r', 'utf-8', 'ignore') as f: lines = f.readlines() print(lines)
import codec
path = 'wordpress/wp-admin/about.php'

with codecs.open(path, 'r', 'utf-8', 'ignore') as f:
    lines = f.readlines()
    print(lines)

Why does category list return “Uncategorized”?

Problem

I’d like to display the list of all categories in my theme, but it return “Uncategorized”. The PHP code in index,php is as below.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<h1>Categories</h1>
<?php
$categories = get_the_category();
if ( $categories ) {
foreach( $categories as $category ) {
$output .= '<a href="' . get_category_link($category->term_id)
. '">' . $category->cat_name . '</a>' . ' ';
}
echo $output;
}
?>
<h1>Categories</h1> <?php $categories = get_the_category(); if ( $categories ) { foreach( $categories as $category ) { $output .= '<a href="' . get_category_link($category->term_id) . '">' . $category->cat_name . '</a>' . ' '; } echo $output; } ?>
<h1>Categories</h1>
<?php
	$categories = get_the_category();
	if ( $categories ) {
		foreach( $categories as $category ) {
			$output .= '<a href="' . get_category_link($category->term_id) 
			. '">' . $category->cat_name . '</a>' . ' ';
		}
		echo $output;
	}
?>

The result is the following.

The cause of this problem

get_the_category(); is the function to get the categories of “current” post or page. This page is index.php with no category, so it return “Uncategorized”.

Solution

Use get_categories(); instead of get_the_category();.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
<h1>Categories</h1>
<?php
$categories = get_categories();
if ( $categories ) {
foreach( $categories as $category ) {
$output .= '<a href="' . get_category_link($category->term_id)
. '">' . $category->cat_name . '</a>' . ' ';
}
echo $output;
}
?>
<h1>Categories</h1> <?php $categories = get_categories(); if ( $categories ) { foreach( $categories as $category ) { $output .= '<a href="' . get_category_link($category->term_id) . '">' . $category->cat_name . '</a>' . ' '; } echo $output; } ?>
<h1>Categories</h1>
<?php
	$categories = get_categories();
	if ( $categories ) {
		foreach( $categories as $category ) {
			$output .= '<a href="' . get_category_link($category->term_id) 
				. '">' . $category->cat_name . '</a>' . ' ';
		}
	echo $output;
	}
?>

This is the result.

Categories

AfterEffects Algorithm Artificial Intelligence Blender C++ Computer Graphics Computer Science Daily Life DataAnalytics Event Game ImageProcessing JavaScript Kotlin mathematics Maya PHP Python SoftwareEngineering Tips Today's paper Tools TroubleShooting Unity Visual Sudio Web Windows WordPress 未分類