If you install the text editor Notepad++, you will notice that a new item appears in the context menu when you right-click a file. “Open in Notepad++”. This item appears for all files, even ones that you would would usually just double-click to execute and never think of opening them in a text editor. But then you think . . . what does it look like inside? And furthermore, how does it work and can I edit it?

Text Is A Beautiful Thing

One of the most awe-inspiring computer epiphanies you can have is when you realise that everything is text (we’re not going to go into the philosophical minutiae of text versus language). Everything that happens on your computer can be viewed in a text editor (hence making the humble text editor a very powerful thing). It can even be edited, and this can produce a infinitude of possible outcomes, most of them errors.

The set of symbols itself is beautiful. The symbols that you are familiar with on your computer that are consistent across all fonts are based on industry standards. ‘ASCII’ is the basic set with 256 characters (one of those computery 2-index numbers that is very easy for computers to work with) and was developed in the 1960s for telegraphs. Here is a list of all the ASCII characters (this kind of thing is just cool to look at, I like it):

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ € ‚ƒ„…†‡ˆ‰Š‹Œ Ž ‘’“”•–—˜™š›œ žŸ ¡¢£¤¥¦§¨©ª«¬ ®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Then in the 1980s came the Unicode standard—now the standard that modern web browsers and applications use. This standard currently has over 128,000 characters from a vast range of dialects both modern and historical. And it includes space to include far more and is ever growing. In fact, it has become sort of an underground hipsterish bragging right to have contributed a character into Unicode and you can actually do this if you submit a well-reasoned proposal to the Unicode Consortium. Most of these sorts of proposals are emojis which is an exciting field in the world of text characters i’m sure.

It’s impractical to list all of the Unicode characters here let alone in a single Wikipedia page. So here’s just some weird ones.

⍔⌬?⍟☃⸏⸎?╔Ẵℱ?ก้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็กิิิิิิิิิิิิิิิิิิิิก้้้้้้้้้้้้้้้้้้้้ก็็็็็็็็็็็็็็็็็็็็

And no font supports all the Unicode characters (yet, however Google is working on it) and there are WingDings fonts which represent characters differently, hence affording you even more characters, even though the breadth of Unicode is making starting to make WindDings look like a 90s thing.

What Different Files Look Like On The Inside

This is going to be an adventure into different files that you use everyday on your computer and what they look like on the inside. There will be a lot of weird Unicode characters and some of them won’t even display properly on your computer, but that’s ok it’s actually cool in a different way.

To open these files, either drag them into the application Notepad or Notepad++ or any other text editor (not word processor)

Let’s open a .jpeg image first up:

.jpeg

rainbow-jpeg

This is what it looks like inside:

ÿØÿà JFIF  ` ` ÿá RExif MM *     >Q   Q   Q   ICC Profile ÿâXICC_PROFILE  HLino mntrRGB XYZ Î   1 acspMSFT IEC sRGB öÖ  Ó-HP cprt P 3desc „ lwtpt ð bkpt  rXYZ  gXYZ , bXYZ @ dmnd T pdmdd Ä ˆvued L †view Ô $lumi ø meas 

(There are a lot of blank characters that represent characters that have no symbol)

jpeg is the most popular image filetype because it takes advantage of limitations in our human visual perception to save space. jpeg images are far smaller than bmp format (bitmap) which is a ‘lossless’ format meaning that it is a lot larger but there is no ‘compression’ of the colour details that humans can’t see anyway. Hence jpeg is ‘lossy’ which does mean that if you convert an image to jpeg it will lose information. All compressed files look similar to this when you open them up. The symbols in this code are the Unicode set, which includes a total of 1,111,998 possible characters which is way more than those that actually have been assigned symbols. Hence why there is no way that any font can display a jpeg file without a lot of character codes everywhere.

Compressed files use code, like everything on your computer, but it is a very succinct code that is difficult and intensely frustrating for humans to read despite the thrill of seeing a snowman character actually being used as some language construct (☃).

Some people edit (with varying degrees of trial-and-error) jpeg code to produce glitches for artistic effect. This has born a whole new field of art called ‘Glitch Art’ which can be quite striking:

jpeg-glitch-art

.bmp

For comparison, let’s view the far simpler and lossless format Bitmap. Bitmap files are uncommon outside of the local filesystems of creative professionals because they are far larger (and hence slower to load) than their lossy counterparts that are visually indistinguishable. But’s it’s interesting to look inside to see how a very small, very simple image file works:

bitmap-pattern

(This is not a BMP but a JPEG image btw due to WordPress’s security restrictions but the following code was from the BMP version)

BMö 6 (     Ä Ä !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ !ÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ ÿÿ Ü ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿÜ ÿ

There are still various equations, graphs and flowcharts regarding how bitmaps are encoded (this article is a bit of a mindf*ck), but you can see that in general, the different colours are represented by different strings of characters and white—the null value is represented by all question marks.

If you want to have a go editing image files you should try it out with the previous two images. To be honest it’s either going to do something imperceptibly small or make the file unreadable lol.

.py

Python files are uncompressed, very human-readable code. Python is the go-to first-timer’s programming language for this reason and also because it is still very powerful and quite fast. Here is an excerpt from a python file for a version of the game ‘Snake’ that runs in the console (from here):

def eatApple(i):
    global grow, score

    apples.pop(i)
    spawnApple()
    grow += config.food_values['apple']
    score += 1


def moveSnake():
    global grow, lastPos

    last_unchanged = None
    lastPos = (snake[len(snake)-1][0], snake[len(snake)-1][1])
    for i, part in enumerate(snake):
        if i == 0:
            x = part[0] + speed * direction[0]
            y = part[1] + speed * direction[1]
        else:
            x = last_unchanged[0]
            y = last_unchanged[1]

        last_unchanged = (snake[i][0], snake[i][1])
        snake[i] = (x, y)

The structure of how the language works is so clear in python. It defines some functions that do things like eat an apple and move the snake. Indented under these are the way those functions work.

When you eat an apple, the following things happen: the snake grows, the point increases and a new apple is spawned.

The function moveSnake() is a bit more complex but it’s interesting to look at the if/then structures in this one and how variables are being defined with equals signs.

.exe

Executable files are files of code that has been ‘compiled’ in a way that makes it easily runnable (executable) on Windows and its predecessors. It includes the original code and some other stuff like graphics and the graphical user interface.

Here’s a random section of the exe file for Ditto clipboard manager:

MZ   ÿÿ ¸ @  º ´ Í!¸LÍ!This program cannot be run in DOS mode.

$
Ñ Nv¿ZNv¿ZNv¿Z! ZJv¿ZG<ZOv¿ZÝ8'ZOv¿ZUë!Z@v¿ZUë#ZHv¿Z! #ZJv¿ZÍ~âZLv¿ZNv¾Z_s¿ZG,ZSv¿ZUëZev¿ZUëZÜv¿ZUë%ZOv¿ZUë"ZOv¿ZRichNv¿Z PE L ´ìV à 
è n

To the human eye this looks almost identical to a jpeg file and that’s because exe’s also use compression. Compression compacts multiple files into one easily-double-clickable one and not only saves space but optimises performance.

Compression to be honest, takes a lot of the fun out of this adventure because so many files are compressed due to the obvious advantages (and the large financial advantage that people can’t steal your code, and hence why a lot of compression is actually designed to ‘encrypt’ files and make them indecipherable). This was a big topic of debate in the 90s and the software corporations won meaning that if you open Microsoft Word (the exe) up in Notepad you will be far from looking at the code that makes it work.

.html

HTML is the markup language used to create the webpages of the internet. Unlike Python, HTML is not a programming language, but is just used for creating pages and graphical user interfaces. You will have seen this if you have ever pressed Ctrl-Shift-i on modern browsers to open up the developer console, or if you press Ctrl-u to ‘view page source’, or if you open a .html file in a text editor.

HTML is extremely easy to understand and you will see so in this example of the basic HTML file:

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8"/>
        <title>Daven’s coool page! :)</title>
    </head>
    <body>
        Daven’s a cool guy
        and a cool dude
        thanks for coming to my cool website!
    </body>
</html>

This is the structure that all HTML files follow. HTML uses pairs of matching ‘tags’ that can contain other tags and content. If you open this in your web browser it will just show the text in the body section with no styling or other page elements. There are a lot of things you can add like stylesheets, programming code, tables, images, etc.

.docx

These files are compressed, so it looks like an EXE or JPEG or any other random compressed file:

´”MOÂ@†ï&amp;þ‡f¯¦]ð`Œ¡pP&lt;*‰ÏËv »ÙY¾þ½Ó4@QðÒ¤Ý}ß÷ÙÙÎô+]Dð¨¬IY7é°Œ´™2Ó”½Ÿã{a&amp;…5²5 ô¯¯z㵌Hm0e³Üç(g &amp;ց¡•Üz-½ú)wB~Š)ðÛNçŽKk˜‡Òƒõ{O‹y¢áŠ&gt;×$
dÑc½±ÌJ™p®PR"å“ýH‰7 )«=8Soƒñ½ åÊဍî•JãUÑHøð"4að¥õϬœk:CrÜf§Ís%¡Ñ—nÎ[ ˆTs]$ÍŠÊlùr˜¹ž€'ååAëVëðòµï‰ñ*̆y’þ¸öKÑ—•Oêˆm{„@õ&gt;%ä{Äm7çV„%LÞþbǼ$§þ‹I'Tü—Åh¬[!
àÕ³{6Ges,’Úsä­CbþÇÞN©RSß;ðAA3§öõy“HðìóA9b3Èödój¤÷¿ ÿÿ PK   ! ‘·ó N  _rels/.rels ¢(  

Except, there is a way around this. (This is a random document about asbestos removal btw.) You can save the file as an .xml document in which case it will not be compressed. XML is the language category that HTML fits into as well and hence it looks like this kind of thing:

n</w:t></w:r><w:r><w:t xml:space="preserve"> having asbestos in y</w:t></w:r><w:r w:rsidR="0021411B"><w:t>our home or office! They’r</w:t></w:r><w:r><w:t>e here to help you safely and efficiently remove asbestos in any part of your property.</w:t></w:r></w:p><w:p w:rsidR="00FF6533" w:rsidRDefault="0021411B" w:rsidP="00BC6274"><w:r><w:t>They’re comprised of</w:t></w:r><w:r w:rsidR="00FF6533"><w:t xml:space="preserve"> asbestos removalists who are experienced in utilising advanced technology in removing asbestos cement sheets, vinyl tiles, floor tiles, and asbestos cladding</w:t></w:r><w:r w:rsidR="00F7244A"><w:t>.</w:t></w:r></w:p><w:p w:rsidR="00F7244A" w:rsidRDefault="005962EC" w:rsidP="00BC6274"><w:r><w:t>Since 2010, 1</w:t>

.txt

TXT files are the simplest filetype because there is no distinction between form and function. Their content is exactly what they display. Text files open in text editor apps by default as you know and they display exactly what is inside them. E.g.

Todo:
* get cool
* make money

And that’s a nice way to end our trip through different filetypes. I hope you learnt something and at least thought that looking inside files was fun. From now on, I bet you will find pleasant distraction from work by opening miscellaneous and obscure filetypes up in your text editor and possibly screwing them up.

Leave a comment