It is often important, especially when dealing with databases and such, that files are stored in the correct character set. Failure to do so can result in illegible displays or even data corruption. Checking the character set of a file in Linux can be accomplished using the file
command:
Jubal@Stranger:$ file migrate1.csv
migrate1.csv: Little-endian UTF-16 Unicode English text, with CRLF, LF line terminators
Jubal@Stranger:$ file migrate2.csv
migrate2.csv: UTF-8 Unicode (with BOM) English text, with CRLF line terminators
In the above example, we are told that migrate1.csv
file is a UTF-16 file with mixed line terminators as well as other information. Similarly, the second file, migrate2.csv
is a standard UTF-8 document.
Hope this helps!
- Log in to post comments