You are here

Unicode

Finding out the character set of a file in Linux

It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file command:
Jubal@Stranger:$ file migrate1.csv
migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators
Jubal@Stranger:$ file migrate2.csv
migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators

MySQL charset issues while importing data using LOAD DATA INFILE

Earlier today, I was banging my head against the wall trying to import some data in a CSV file into MySQL. While my imports have gone well thus far, this time around I was dealing with data involving lots of strange diacritics, runic squiggles and other manners of gibberish that make the world as fun as it can be. In other words, I was dealing with Unicode.

Subscribe to RSS - Unicode