Encoding

Here I was simply creating a MySQL (5.5) table when suddenly up pops the following error:

#1071 - Specified key was too long; max key length is 767 bytes

After a little trial and error, I found that since one of my VARCHAR fields was being used for a UNIQUE index, MySQL was basically telling me that it was using too much space. When I reduced the length of this field from its initial 512 setting to 256 & then 255, it still complained. However, reducing it further to 128 fixed the issue!

While performing a CSV import recently, I ran into the following error messages:

Warning (Code 1366): Incorrect string value: '\xE9, a <...' for column 'body' at row 3
Warning (Code 1366): Incorrect string value: '\xE6. He ...' for column 'body' at row 24
Warning (Code 1366): Incorrect string value: '\xE9, and...' for column 'body' at row 26

The first message was triggered due to the accented é in the word, protegé, in the input. The rest of the field was not imported. The others were similarly triggered.

During imports and stuff, it's imperative that all steps utilise the same encoding/character set. If a text file is not using the preferred encoding, we can use Vim to change it during its save action as follows:

:set fileencoding=utf8
:w

or if you want to save it to a different file and leave the current file unchanged:

:w ++enc=utf-8 newfile.txt

I work extensively on a Windows desktop. However, I do SSH into Linux servers often and I do so using PuTTY, a free and open source client. Everything works peachy. However, I recently had occasion to work extensively with some Unicode source data and I found that there were times when I thought that there were encoding issues with the data as they were not being displayed correctly on my screen.

It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file command:
Jubal@Stranger:$ file migrate1.csv
migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators
Jubal@Stranger:$ file migrate2.csv
migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators

All times are UTC. All content licensed under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 License.