annotate contrib/FILE-FORMAT @ 154:581fab9f1dc7

avoid emitting bogus empty email messages into contacts and calendar files
author Carl Byington <carl@five-ten-sg.com>
date Sat, 14 Mar 2009 15:13:27 -0700
parents c508ee15dfca
children 5c0ce43c7532
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
16
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
1 File format for Outlook pst files
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
2 =================================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
3
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
4 Basically, we work on two indexes. One index associates an ID with each item, and the second index associates a second ID with the original ID. I see no real purpose for this yet.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
5
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
6 0x00 - Signature [4 bytes] (0x4E444221)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
7 0xA8 - File Size [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
8 0xC4 - Pointer to Index of all Items in File, associating the first ID [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
9 0xBC - Pointer to Index of controlling Items in File [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
10
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
11 First All Items Index: - constists of a table of offsets pointing to the table of items.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
12 ======================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
13 repeating:
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
14 0x0 - First id in this table [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
15 0x04 - Unknown [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
16 0x08 - Offset of table [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
17
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
18 until "First id in this table" is zero
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
19
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
20 Table Of Items: - Pointed to by above records.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
21 ===============
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
22 repeating:
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
23 0x0 - Id1 of this item [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
24 0x04 - Offset of this item [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
25 0x08 - Size of data stored there [2 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
26 0x0A - Unknown [2 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
27
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
28 until "Id1 of this item" is zero. When this is reached, you return to the above table and read the next record
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
29
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
30 Second All Items Index: - Contains the descriptors for emails, and other items
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
31 =======================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
32 repeating:
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
33 0x0 - First id2 of this table [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
34 0x04 - Unknown [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
35 0x08 - Offset of table [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
36
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
37 until "First id2 of this table" is zero
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
38
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
39 Second Table of Items: - Pointed to by above records
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
40 ======================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
41 repeats 0x1F times
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
42 0x0 - Id2 of this item [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
43 0x04 - Id1 of the descriptor item [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
44 0x08 - Id1 of the associated list [4 bytes] (this contains a list of id1 and id2 that are to with this controlling item)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
45 0x0C - Id2 of parent [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
46
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
47
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
48 Associated List: - pointed to by the above record. Contains associations between id1 and id2 for the items controlled by the record
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
49 ================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
50 0x0 - Constant [2 bytes] (0x0002)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
51 0x02 - Count [2 bytes] (the number of items that are about to follow)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
52
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
53 repeating
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
54 0x0 - Id2 of record [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
55 0x04 - Id of record [4 bytes] - This is an association between the two
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
56 0x08 - Unknown [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
57 until you have reached the "Count"
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
58
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
59
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
60 Descriptor Item: - Referenced from "Second Table of Items" - contains information about the item (email, contact...)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
61 ================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
62 0x0 - Block Offset to Block Index [2 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
63 0x02 - Constant [2 bytes] (0xBCEC)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
64 0x04 - Index Pos of Section1 [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
65
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
66 NOTE: An index pos can be shifted left 4 times [ i_pos << 4 ] to get an index offset (ie, an offset from the start of the block index)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
67
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
68
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
69 Block Index: - contains offsets to points in the current block
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
70 ============
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
71 0x0 - Count of offsets minus one. [2 bytes] (In effect, each offset must be taken with the following one so that the start and end of the referenced item can be established. Therefore there is one extra to show the end of the last item.)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
72
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
73 repeating
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
74 0x0 - Block Offset [2 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
75 until you have one extra than the "Count"
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
76
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
77
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
78 Section1: - Referenced from "Descriptor Item" - contains not much
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
79 =========
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
80 0x0 - Constant? [4 bytes] (0x0602B5)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
81 0x04 - Index Pos of Descriptor fields [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
82
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
83
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
84 Descriptor Fields: - Contain the information needed to access the details of the email
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
85 ==================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
86 repeats
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
87 0x0 - Item type [2 bytes] (subject, from, to ...)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
88 0x02 - Reference type [2 bytes] (how to interpret the value)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
89 0x04 - Value [4 bytes]
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
90 until the alloted size of the record has been read. (The following Block Offset from the Index has been reached)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
91
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
92 Reference Types: - I don't know if I have interpreted this field correctly. It might have a completely different purpose
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
93 ===============
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
94 0x0002 -
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
95 0x0003 - Value following is a value in it's own right
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
96 0x000B -
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
97 0x001E - (STRING) Value following is a Index Position (must be shifted left 4 times)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
98 0x0040 - (DATE) " " " " "
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
99 0x0048 -
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
100 0x0102 - (STRUCTURE) " " " " "
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
101 0x1003 -
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
102 0x101E - (ARRAY OF STRING)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
103 0x1102 -
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
104
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
105 Value:
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
106 ======
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
107 When the value is of type Index Position, you can left shift the value 4 times to get an offset into the Block Index. Some descriptor types can have Id2 values. This is recognised by using a bitwise AND with the number. ie val & 0x0000000F. if the result is 0xF, it is likely to be a Id2 reference.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
108
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
109
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
110 Descriptor Types: - Types that are in "Descriptor Fields"
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
111 =================
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
112
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
113 All Values are Hex
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
114
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
115 Note: it appears that some types can have a IPOS value or a ID2 value depending on the size of the field in question. It is safer to check every field than for me to say what the "usually" contain. Absolute values though, are generally going to be constant.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
116
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
117 Type Ref Type Value Desc
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
118 ---- -------- ----- ----
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
119 001A [REF] IPM Context. What type of message is this
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
120 0037 001E [REF] Email Subject. The referenced item is of type "Subject Type"
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
121 0039 [REF] Date. This is likely to be the arrival date
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
122 003B [REF] Outlook Address of Sender
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
123 003F [REF] Outlook structure describing the recipient
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
124 0040 [REF] Name of the Outlook recipient structure
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
125 0041 [REF] Outlook structure describing the sender
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
126 0042 [REF] Name of the Outlook sender structure
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
127 0043 [REF] Another structure describing the recipient
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
128 0044 [REF] Name of the second recipient structure
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
129 004F [REF] Reply-To Outlook Structure
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
130 0050 [REF] Name of the Reply-To structure
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
131 0051 [REF] Outlook Name of recipient
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
132 0052 [REF] Second Outlook name of recipient
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
133 0064 [REF] Sender's Address access method (SMTP, EX)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
134 0065 [REF] Sender's Address
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
135 0070 [REF] Processed Subject (with Fwd:, Re, ... removed)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
136 0071 [REF] Date. Another date
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
137 0075 [REF] Recipient Address Access Method (SMTP, EX)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
138 0076 [REF] Recipient's Address
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
139 0077 [REF] Second Recipient Access Method (SMTP, EX)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
140 0078 [REF] Second Recipient Address
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
141 007D 001E [REF] Email Header. This is the header that was attached to the email
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
142 0C19 [REF] Second sender struct
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
143 0C1A [REF] Name of second sender struct
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
144 0C1D [REF] Second outlook name of sender
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
145 0C1E [REF] Second sender access method (SMTP, EX)
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
146 0C1F [REF] Second Sender Address
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
147 0E03 [REF] CC Address?
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
148 0E04 [REF] SentTo Address
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
149 0E06 [REF] Date.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
150 0E07 [REF] Flag - contains IsSeen value
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
151 0FF9 [REF] binary record header
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
152 1000 001E [REF] Plain Text Email Body. Does not exist if the email doesn't have a plain text version
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
153 1013 001E [REF] HTML Email Body. Does not exist if the email doesn't have a HTML version
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
154 1035 [REF] Message ID
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
155 1042 [REF] In-Reply-To or Parent's Message ID
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
156 1046 [REF] Return Path
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
157 3001 [REF] Folder Name? I have seen this value used for the contacts record aswell
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
158 3007 [REF] Date.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
159 3008 [REF] Date.
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
160 300B [REF] binary record header
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
161 35E0 [REF] binary record found in first item. Contains the reference to "Top of Personal Folder" item
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
162 35E3 [REF] binary record with a reference to "Deleted Items" item
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
163 35E7 [REF] binary record with a refernece to "Search Root" item
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
164 3602 [REF] the number of emails stored in a folder
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
165 3603 [REF] the number of unread emails in a folder
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
166 3613 [REF] the folder content description
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
167 8000- Contain extra bits of information that have been taken from the email's header. I call them extra lines
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
168
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
169 Key:
c508ee15dfca switch to automake/autoconf
carl
parents:
diff changeset
170 [REF] = Can be either Index Position, or an Id2 Reference