MSG Is Bad For You!

Essay by Ralph Losey

Chinese Egg Rolls with no MSGNo, I'm not talking about Chinese food, I'm talking about an Outlook file format. Since Outlook email is at the heart of many, if not most, electronic document productions today, it is essential to understand some of the different file formats this software uses, especially MSG and PST. Otherwise you can easily fall into an expensive and confusing MSG e-discovery trap. So, just as you want to avoid MSG in your food, you want to avoid it in your e-discovery production too. This essay will explain why.

Email Metadata and Production

On individual computers, Outlook can store emails in two different formats with two different file name extensions: MSG and PST. MSG stands for what you would think, 'Message'. It is the file extension used to identify a single email message. The PST extension is a Microsoft specialty that stands for 'Personal Storage Table.' It is used to identify all of the emails (with attachments) stored by one particular user. This is how almost everyone maintains and uses their personal Outlook email program. They keep their email in various folders, which all together make up one PST file. Indeed, this is the default procedure, although a user can (assuming no administrative restrictions) separate their emails, and scatter them all over their computer as many separate, unrelated emails. In this event, the different extension of MSG is used to store the individual emails.

The situation is different in a corporate or enterprise server environment, something which I did not mention when I first wrote about this. On Microsoft Outlook email servers, the emails of individual users in the server group are all stored together in a single container file, an electronic database file with the file extension EDB. (In Lotus email systems, it's called NFS.) The individual users on the server do not have individual PST containers on the server, but the individual PST files can be easily created as a copy from the master EDB file on the server. This can be done at any time by the server administrator, or even by the individual users, unless this feature has been disabled.

This function of Outlook is frequently disabled for individual users because otherwise, users can create their own PST files on their hard drives, or even on their own portable storage devices, and the system administrator will never know about these backup files. This makes it very difficult to locate all emails in a large organization to find information, implement a legal hold, or collect responsive ESI. When there is a proliferation of unknown PST files, it is impossible to know if the EDB file is complete. This is because some emails in a user's section of a master EDB file may have been deleted from the EDB file, but still remain on the previously generated PST file.

In a corporate server environment, to respond to a native file production request or preservation notice, the email of the individual users affected must be copied from the master container EDB file having the email of everyone, into individual PST files of the users whose email might be relevant. The PST files created from the master EDB file must be searched for relevant emails, and non-responsive and privileged emails deleted, and then a new resp onsive PST file reconstituted for production. (Also, in an environment where users have the ability to create their own PST files at any time, you must ask about and preserve/search through these other PST files as well.)

Craig Ball, an e-discovery expert with a deep understanding of forensics and technology, correctly points out that this is not a pure native production; that would require production of the original EDB container file. In his excellent article, Re-Burn of the Native, found at page 73 of Musings on Electronic Discovery, Craig calls it a 'Quasi Native Production' and explains:

Chockablock [yes, he talks like that] as it is with non-responsive material, there are compelling reasons not to produce 'the' source PST. But there's no reason to refuse to produce responsive e-mails and attachments in the form of a PST file, so long as it's clearly identified as a reconstituted file containing selected messages and the contents fairly reflect the responsive content and relevant metadata of the original. Absent a need for computer forensic analysis or exceptional circumstances, a properly constructed quasi-native production of e-mail is an entirely sufficient substitute for the native container file.

Craig goes on to say that the production does not have to be made in a PST file to be 'Quasi Native,' it could also be produced as a MSG file. Although that is certainly true, as MSG is also native to Outlook, in my opinion, the PST format is typically preferred. To understand why it is helpful to use a paper file comparison.

Outlook, by default, keeps an individual user's emails all together in a filing cabinet type structure. Received emails start in the Inbox folder. The user can then create various subfolders to file the emails for later easier access. It is equivalent to providing a filing cabinet to store paper letters, but with a virtually unlimited number of blank folders and cabinet space. Just as in a paper filing system, with Outlook (and most other email software, such as Lotus), you label the folders yourself, and file your emails in the folders you deem appropriate. This should result in some kind of rational record storage system that makes sense to the user and allows them to retrieve old letters/emails more easily. The folders' names and ordering system often provide useful insights into the user's thinking, and sometimes help to explain the meaning of a particular document. For instance, if a user created a folder called 'Important', their decision to place a particular document in that file tells you something about the document itself, or at least about the user's attitude toward that document. So when you take a single email out of the Outlook folder, it is equivalent to removing it from a paper file folder and keeping it loose on your desk (or floor).

Parties today frequently specify the production of files in their Native format so that all metadata will be preserved. Indeed, most commentators agree that Native file production under the new rules, specifically Rule 34(b)(ii), is now the default mode of production absent agreement by the parties to the contrary. (There are, by the way, many good reasons to agree to non-native file production, so long as essential metadata is still preserved, pertaining to the advantages of loaded TIFF files and trial preparation software). Moreover, most believe that the primary purpose behind this rule specification is to preserve metadata. Rule 34(b)(ii) states:

(ii) if a request for electronically stored information does not specify the form or forms of production, a responding party must produce the information in a form or forms in which it is ordinarily maintained or in a form or forms that are reasonably usable.

An argument can be made that both types of Microsoft email files, individual MSG files and collective PST files, are 'Native' files, since they are both produced and used by Outlook. In that sense, they are both native to that software. But it is the PST form in which almost everyone ordinarily maintains their individual Outlook emails, not the MSG form, and so, in my opinion, that is the form contemplated by the rule. (I rule out production of the original native file in an enterprise server environment, production of the EDB or NSF files, because they include all emails of all users in the enterprise, and that would almost never be relevant, and would otherwise be unwise, as Craig Ball explains well in his Re-Burn of the Native article.)

Rule 34 (b)(ii) also provides for production in a form alternative to the native 'ordinarily maintained' form, by specifying that production can also be made in 'forms that are reasonably usable.' Under the alternate 'reasonably usable' form, flatted image files, or Rich Text Format ('RTF') file production, may arguably suffice. But in my opinion, and Craig agrees, they are only 'reasonably usable' if fully searchable and if paired with attachments. Moreover, if other metadata is needed in a particular case that is not shown in the image file, then this metadata should be preserved in a load file for the image files to be considered reasonably usable.

When parties have agreed to native production, and also to the preservation of metadata, then I suggest the situation is clear: that Outlook files should be produced in PST format, not MSG format. But before I complete the basis for this contention, further explanation of the terms might help. The Sedona Conference Glossary (2005) defines 'native format' as follows:

Native Format: Electronic documents have an associated file structure defined by the original creating application. This file structure is referred to as the 'native format' of the document.

Palgut v. City of Colorado Springs, 2006 WL 3483442 (D. Co. Nov. 29, 2006) (previously discussed in my blog) cites to the Judge's Guide and defines 'Native format' as:

'Native format' means all documents that are created in digital format (word processing files, spreadsheets, presentations, and E-mail) have a native file format - that is, a format designed specifically for the most efficient use of the information in which this kind of software specializes.

Outlook has designed the PST format for the most efficient use of the information it creates for individual users. That is why it is the default. True, it also has an alternative MSG format, but it is not the most efficient use of the information. The most efficient use is to keep all of the emails together, organized into different folders, the way the information was originally and ordinarily maintained. Further, when you take a single email, remove it from the PST file, and put into into a standalone MSG format, you are stripping it of a key piece of metadata.

When Outlook emails are converted from their original PST format to MSG format, the metadata that shows where the email was located in the custodian's folders is usually lost. It is equivalent to taking a filing cabinet full of paper letters, wherein the correspondence is filed and placed in appropriate drawers, files, folders and sub-folders, and then dumping them out of the drawers and folders, into one big box of mixed-up, disorganized individual letters.

In short, you can see the original Outlook folder structure in a native format production of PST files, but can not and will not ever know this information in MSG format production. That makes review of the MSG production substantially more difficult and expensive than review of a PST production. Further, MSG production makes it impossible to determine what letters were originally filed together, and hides the file names created by the custodian to identify these folders. Thus, for instance, if a user created folders labeled 'hot', 'unimportant', and 'bogus', and then produced 100 emails from the unimportant folder, 20 from the bogus, and only 1 from the hot, this would no doubt lead to important deposition questioning.

So be wary of Outlook production in individual MSG files, which some parties may insist upon as less expensive than PST production. Instead, demand PST format. This is one of many items that savvy e-discovery lawyers will want to discuss in the initial meetings under new Rules 16 and 26.

Public Comment From Craig Ball

Dear Ralph:

Thanks for the kind comments. It's always heartening to know someone is reading what I write. It's even better when they see things the same way!

The points you make about the advantages of .PST formatted production against production in .MSG format are excellent. I couldn't agree more that preserving and producing the folder structure is unquestionably helpful and oftimes necessary. Like you, I'd rather get e-mail formatted as a PST.

But I don't think that necessarily mitigates against .MSG when the folder structure is preserved - which can be done externally by something as simple as producing the .MSGs within an identical folder tree or by furnishing the path data for each message in a load file.

Why am I stubbornly defending .MSG when .PST is superior?

It's because there are more off-the-shelf applications that can deal with the .MSG format than the .PST. By far the dominant standard for corporate e-mail, you'd think every tool would cut through a PST like a hot knife through butter. Instead, I find that the compressed and encrypted form of PSTs, along with their largely undocumented file internal structure, is something that trips up many tools. Moreover, until they are compacted, PSTs have a nasty propensity (or a wondrous propensity, depending on one's point of view) to carry double deleted files invisibly within their structure. That's probably not a big issue in the context of reconstituted production formats, but it's the sort of insidious risk that keeps us lawyers up at night.

One further comment. you indicate that a local PST isn't the default in an Exchange environment. Here, I mean on the user's PC hard drive. That's true, but it's almost universally supplanted by a file with the .OST extension that holds synchronized Exchange e-mail in order to support offline access within Outlook. Accordingly, you almost always run into some sort of local e-mail storage file potentially at variance with what you'll find on the server. Additionally - as you well appreciate - the user may create any number of .PST backup files at intervals and, to boot, there is often a need to look for an auto-archive container files, locally on the machine or stored within the users network storage areas or 'shares.'

Processing just the server stored email may be sufficient, but it's not often complete.

Looking forward to your book!

Craig Ball

Public Comment From Venkat Rangan

I agree with your assessment that an MSG file is not ideally suited as a native file representation in Microsoft Outlook and Exchange environments. However, I do want to mention that the alternatives to produce native files are also beset with problems.

Technically, the most 'native' of files in a Microsoft Exchange messaging environment is the EDB file that Exchange stores in its mail servers. This is because this is the file that contains all the mails for all the mailboxes on the Exchange Mailstore, and the place where all emails are actually maintained during the course of conducting business. However, the EDB files contain multiple users mailboxes, and would contain confidential and privileged matter that should not be produced. To isolate an EDB to only the custodians that are relevant to a case would require their PSTs to be produced. Although the PSTs are a snapshot of a mailbox within an EDB file, it is technically not a native file that is used during normal course of business. Also, given that standard business processes and Microsoft's proven technologies are used for creating PST files from EDB files, there is a certain accepted belief that it is as close to native as we can get.

On the content of PST files themselves, I agree that it does preserve the complete meta-data of messages including the folder location and the chronological position within that folder. However, producing an entire PST as a responsive document is also not viable, since that would expose other emails that are potentially confidential and/or privileged. When we remove these messages from the original PST, that would alter the PST, therefore exposing yourself for spoliation. An alternative is to produce another PST designed for production and for delivering that to the requesting party. The question then is whether this production step preserves the meta-data of the original message, including the original folder structure that is so vital to establishing certain aspects such as user intent. Also, as one transfers an MSG from the original source PST to a target PST, the MSG's internal content changes. This is because an MSG's insertion into a new PST creates a new EntryID property. One could argue that the essential components of an MSG are preserved, but that is not provable by a Hash of the MSG.

Alternative production approach could be to extract the responsive MSGs from the original PST, store each MSG in a separate file and compute its Hash Value. To preserve the meta-data such as its original folder location, produce a second wrapper file in the form of XML, that includes the file location of the MSG, its hash, the name of the original PST file it was extracted from, and the folder location within that PST. In this way, only the responsive MSG files need be produced, and as they are copied and transferred, its internal contents do not change, and the meta-data are preserved external to the MSG file in a corresponding XML file. The EDRM XML export mechanism provides a framework for producing multiple MSG files while preserving the original meta-data. This would seem to address the needs of native production, while maintaining integrity of MSG as it existed in the original PST file.


Check out this really cool drumming video from You Tube. It has nothing to do with MSG or metadata, but is slightly Asian and provides a good relaxing break.