Because we encountered another hidden encoding issue as part of test data, here are some information about BOM and why this might be interesting in general for everyone working with a computer beyond Excel and Word.
Before you educate yourself, here is the tool to own to see such a problem easily. Most of all editors hide that information and so you might scratch your head why some data is failing with strange error messages. Get xxd and you will see with other eyes:
1 |
$ <strong>xxd</strong> /tmp/2018-01.csv <br>0000000: <strong>efbb bf</strong>23 4375 7374 6f6d 6572 204e 756d ...#Customer Num<br>0000010: 6265 722c 5265 6164 204f 6e2c 506f 7765 ber,Read On,Powe<br>0000020: 7220 4d65 7465 7220 5265 6164 696e 670d r Meter Reading.<br>0000030: 0a30 3030 312c 3230 3138 2d30 352d 3032 .0001,2018-05-02<br>0000040: 2c35 3030 0d0a ,500.. |
The first marked bytes are the magic and now head over to the Wikipedia to read more about BOM: https://en.wikipedia.org/wiki/Byte_order_mark