Notepad++ - Batch convert ANSI GB2312 files to UTF8
I have many ANSI GB2312 (Simplified Chinese) text files. They do not display correctly in English Windows. I want to convert them to UTF8 so they will be displayed correctly.
By using Notepad++ and Python Script Plugin, I can automatic the conversion of hundreds of ANSI-GB2312 formatted files.
WARNING: this scripts traverse all the files in the folder, and overwrite the original file. Please copy your original file to an new working folder (e.g. F:\temp\UTF8) . Please check the result files carefully before dispose your original files.
Preparation
Create a new Python script
Run the script:
(Based on Philip's blog: Mass convert a project to UTF-8 using Notepad++)
By using Notepad++ and Python Script Plugin, I can automatic the conversion of hundreds of ANSI-GB2312 formatted files.
WARNING: this scripts traverse all the files in the folder, and overwrite the original file. Please copy your original file to an new working folder (e.g. F:\temp\UTF8) . Please check the result files carefully before dispose your original files.
Preparation
- Install Notepad++ (Tested with 32 bit v7.5.6)
- Download Python script plugin. (Tested with PythonScript_Min_1.0.8.0.zip)
- Copy the Python script files to your Notepad++ installation folder.
- Restart Notepad++. Then you should be able to find "Plugins - Python Script"
Create a new Python script
- In Notepad++, go to Plugins - Python Script - New Script.
- Enter a new file name (e.g. Convert_ANSI_GB2312_to_UTF8), then press Save.
- Copy and paste the below script to the new file. Then save it.
Run the script:
- Copy the files you want to convert to F:\temp\UTF8. (You may change your path in your script)
- In Notepad++, go to Plugins - Python Script - Scripts - Convert_ANSI_GB2312_to_UTF8
- Then Notepad++ will convert all the files and save to the same file.
(Based on Philip's blog: Mass convert a project to UTF-8 using Notepad++)
import os;
import sys;
filePathSrc="f:\\Temp\\UTF8"
for root, dirs, files in os.walk(filePathSrc):
for fn in files:
if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico':
notepad.open(root + "\\" + fn)
console.write(root + "\\" + fn + "\r\n")
#Does not work --> notepad.runMenuCommand("Encoding", "Character sets", "Chinese", "GB2312 (Simplified)")
notepad.menuCommand(MENUCOMMAND.FORMAT_GB2312)
# notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM")
notepad.menuCommand(MENUCOMMAND.FORMAT_CONV2_UTF_8)
# Reference: https://github.com/bruderstein/PythonScript/blob/master/PythonScript/src/NotepadPython.cpp
notepad.save()
notepad.close()
Comments