Finding out the character set of a file in Linux

It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file command:
Jubal@Stranger:$ file migrate1.csv migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators Jubal@Stranger:$ file migrate2.csv migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators

In the above example, we are told that migrate1.csv file is a UTF-16 file with mixed line terminators as well as other information. Similarly, the second file, migrate2.csv is a standard UTF-8 document.

Hope this helps!

Finding out the character set of a file in Linux

Tags