SIDEBAR
»
S
I
D
E
B
A
R
«
When email in a .msg file is not ASCII
September 18th, 2017 by miki

Got myself an exported message from (apparently) Exchange/Outlook in a file with .msg extension. My initial thought was that this was just a plain ASCII email (seem to remember having handled .msg files as such at some point), but looking at it as text exposed a load of binary and the “file” utility reported it being of type “Composite Document File V2 Document, Cannot read section info”.

What is it?

Some searching reveals .msg is in a proprietary, but largely documented, format called “Outlook Item (.msg) File Format” (or formally, MS-OXMSG, find the specification  in the MSDN document entitled “[MS-OXMSG]: Outlook Item (.msg) File Format“). This format is related to the CFB/CFBF ([MS-CFB]: Compound File Binary File Format).

Check out what Library of Congress’ nice Sustainability of Digital Formats project has to say about this format in their evalutaion of its sustainability (fx. that Microsoft does not advise to use it for sending information to an unknown receiver, Microsoft source).

Extension

Interpreting it

The availability of open tools for parsing this sort of thing, especially when distributing it verbatimly, should of course be a consideration before doing such. Luckily some options are available however they seem not to be widely deployed on default installs in the open computing world, so some mocking about is necessary.

libgsf

The GNOME project has developed an open source library and some tools for interacting with files of this type, check out libgsf documentation at developer.gnome.org/gsf/ and source code at github.com/GNOME/libgsf. Or install the package named libgsf-bin in Debian/Ubuntu for access to the “gsf” binary (man page) that can inspect and dump contents from a .msg file. This is quite rudimentary if you just want to read the content.

MSGViewer (java)

MSGViewer is a stand alone java application that can display/convert different mail-format, including .msg. On GitHub I found a fork which claims to provide further features than the original.

msgconv (perl)

msgconv (github source) is a perl script that converts a .msg file into what it dubs as standard RFC 822 format (aka. RFC 2822/RFC 5322 format) in a file with extension “.eml” (email) format. This seems to be compatible (or identical?) with Thunderbird’s “.eml” format, it reads the converted files at least, as does Outlook itself. The fileformat.com entry for eml seems to support this.

Install as package “libemail-outlook-message-perl” on Debian/Ubuntu.

$ file test.msg
test.msg: Composite Document File V2 Document, Cannot read section info
$ msgconvert test.msg
$ file test.eml
test.eml: UTF-8 Unicode text, with very long lines, with CRLF line terminators

Conclusion

I prefer the approach of using a standard format for storing data, so I ended up accepting the .eml from msgconv which I propagated to other recipients of the file.


Comments are closed

»  Substance:WordPress   »  Style:Ahren Ahimsa
© 2020 Mikkel Kirkgaard Nielsen, contents CC BY-SA 4.0