Skip to main content

Hi there! I am a programmer and I'm very enthusiastic about OpenSource and OpenKnowledge.

twitter.com/koehr_in

github.com/nkoehring

alpha.app.net/koehr

koehr.in

Norman Köhring

The Magic 0xC2

3 min read

I built a web application with file upload functionality. Some Vue.js in the front and a CouchDB in the back. Everything should be pretty simple and straigt forward.

But…

When I uploaded image files, they somehow got mangled. The uploaded file was bigger than the original and the new "file format" was not readable by any means. I got intrigued. What is it, that happens to the files? The changes seemed very random but reproducible, so I created a few test files to see what exactly changes and when.

My first file looked like this:

0123456789
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz

To my surprise, the file stayed the same! My curiosity grew. In the meantime I found a very intriguing pattern in uploads hexdump: C3 BF C3. It was everywhere. In another file, I found similar patterns with C2. So I wrote my next test file. This time a binary file:

00 01 02 03 04 05 06 07  08 09 10 11 12 13 14 15 |................|
16 17 18 19 20 21 22 23  24 25 26 27 28 29 30 31 |.... !"#$%&'()01|
32 33 34 35 36 37 38 39  40 41 42 43 44 45 46 47 |23456789@ABCDEFG|
48 49 50 51 52 53 54 55  56 57 58 59 60 61 62 63 |HIPQRSTUVWXY`abc|
64 65 66 67 68 69 70 71  72 73 74 75 76 77 78 79 |defghipqrstuvwxy|
80 81 82 83 84 85 86 87  88 89 90 91 92 93 94 95 |................|
96 97 98 99 a0 a1 a2 a3  a4 a5 a6 a7 a8 a9 aa ab |................|
ac ad ae af b0 b1 b2 b3  b4 b5 b6 b7 b8 b9 ba bb |................|
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

EDIT: As you probably already noticed, I counted up like in Base10 but it is actually Base16. So I skipped A-F until reaching A0. This might look weird but didn't affect the test.

The result after uploading was

00 01 02 03 04 05 06 07  08 09 10 11 12 13 14 15  |................|
16 17 18 19 20 21 22 23  24 25 26 27 28 29 30 31  |.... !"#$%&'()01|
32 33 34 35 36 37 38 39  40 41 42 43 44 45 46 47  |23456789@ABCDEFG|
48 49 50 51 52 53 54 55  56 57 58 59 60 61 62 63  |HIPQRSTUVWXY`abc|
64 65 66 67 68 69 70 71  72 73 74 75 76 77 78 79  |defghipqrstuvwxy|
c2 80 c2 81 c2 82 c2 83  c2 84 c2 85 c2 86 c2 87  |................|
c2 88 c2 89 c2 90 c2 91  c2 92 c2 93 c2 94 c2 95  |................|
c2 96 c2 97 c2 98 c2 99  c2 a0 c2 a1 c2 a2 c2 a3  |................|
c2 a4 c2 a5 c2 a6 c2 a7  c2 a8 c2 a9 c2 aa c2 ab  |................|
c2 ac c2 ad c2 ae c2 af  c2 b0 c2 b1 c2 b2 c2 b3  |................|
c2 b4 c2 b5 c2 b6 c2 b7  c2 b8 c2 b9 c2 ba c2 bb  |................|
00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

There it was again: The magic 0xC2!

So all bytes with a value higher than 0x79 got followed by a 0xC2. 0x79 is the ASCII code for y. This is at least what I thought. It actually is the other way around: All bytes with value 0x80 or higher got prefixed by a 0xC2! — there the scales fell from my eyes: UTF-8 encoding!

In UTF-8 all characters after 0x7F are at least two bytes long. They get prefixed with 0xC2 until 0xC2BF (which is the inverted question mark ¿), which is then followed by 0xC380. So what happened is, that on the way to the server, the file got encoded to UTF-8 ¯\_(ツ)_/¯

EDIT: Corrected some mistakes after some comments on Hackernews

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

So instead of advertising against bloated websites full of ads and trackers, let's change the browser! https://blog.chromium.org/2017/01/reload-reloaded-faster-and-leaner-page_26.html

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

Reading list for today contains: homomorphic , , , and !

Norman Köhring

Norman Köhring

It is just sad to read it. I'm seriously sad to be part of the male side sometimes. https://blog.jessfraz.com/post/this-industry-is-fucked

Norman Köhring

Oh my goodness! Looks like I need time off and lock myself! https://docs.rs/domafic/

Norman Köhring

Learning for today: If that one problem keeps staying despite all efforts, reconsider its source!

Norman Köhring

Probably the most complex speakable http://www.ithkuil.net and an interesting concept

Norman Köhring

Norman Köhring

Norman Köhring

Norman Köhring

That's what I hate so much about today's websites https://1-minute-modem.branchable.com

Norman Köhring

Norman Köhring

An evening of burgers, beer and number crunching awaits me!

Norman Köhring

Norman Köhring

Looking for hacking music from 90s / 2000s