Notepad++ - Batch convert ANSI GB2312 files to UTF8

I have many ANSI GB2312 (Simplified Chinese) text files.  They do not display correctly in English Windows. I want to convert them to UTF8 so they will be displayed correctly.

By using Notepad++ and Python Script Plugin, I can automatic the conversion of hundreds of ANSI-GB2312 formatted files.

WARNING: this scripts traverse all the files in the folder, and overwrite the original file.  Please copy your original file to an new working folder (e.g. F:\temp\UTF8) .  Please check the result files carefully before dispose your original files.

Preparation
  1. Install Notepad++ (Tested with 32 bit v7.5.6)
  2. Download Python script plugin. (Tested with PythonScript_Min_1.0.8.0.zip)
  3. Copy the Python script files to your Notepad++ installation folder.
  4. Restart Notepad++.  Then you should be able to find "Plugins - Python Script"

Create a new Python script
  1. In Notepad++, go to Plugins - Python Script - New Script.
  2. Enter a new file name (e.g. Convert_ANSI_GB2312_to_UTF8), then press Save.
  3. Copy and paste the below script to the new file.  Then save it.

Run the script:
  1. Copy the files you want to convert to F:\temp\UTF8.  (You may change your path in your script)
  2. In Notepad++, go to Plugins - Python Script - Scripts - Convert_ANSI_GB2312_to_UTF8
  3. Then Notepad++ will convert all the files and save to the same file.

(Based on Philip's blog: Mass convert a project to UTF-8 using Notepad++)

import os; import sys; filePathSrc="f:\\Temp\\UTF8" for root, dirs, files in os.walk(filePathSrc): for fn in files: if fn[-4:] != '.jar' and fn[-5:] != '.ear' and fn[-4:] != '.gif' and fn[-4:] != '.jpg' and fn[-5:] != '.jpeg' and fn[-4:] != '.xls' and fn[-4:] != '.GIF' and fn[-4:] != '.JPG' and fn[-5:] != '.JPEG' and fn[-4:] != '.XLS' and fn[-4:] != '.PNG' and fn[-4:] != '.png' and fn[-4:] != '.cab' and fn[-4:] != '.CAB' and fn[-4:] != '.ico': notepad.open(root + "\\" + fn) console.write(root + "\\" + fn + "\r\n") #Does not work --> notepad.runMenuCommand("Encoding", "Character sets", "Chinese", "GB2312 (Simplified)") notepad.menuCommand(MENUCOMMAND.FORMAT_GB2312) # notepad.runMenuCommand("Encoding", "Convert to UTF-8-BOM") notepad.menuCommand(MENUCOMMAND.FORMAT_CONV2_UTF_8) # Reference: https://github.com/bruderstein/PythonScript/blob/master/PythonScript/src/NotepadPython.cpp notepad.save() notepad.close()

Comments

Anonymous said…
Thanks worked, I just had to change the first encoding to ANSI to make my files change to UTF-8 from ANSI.