You are here

Finding out the character set of a file in Linux

Submitted by Druss on Tue, 2011-09-27 22:21

It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file command:

Jubal@Stranger:$ file migrate1.csv
migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators
Jubal@Stranger:$ file migrate2.csv
migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators

In the above example, we are told that migrate1.csv file is a UTF-16 file with mixed line terminators as well as other information. Similarly, the second file, migrate2.csv is a standard UTF-8 document.

Hope this helps!