Mercurial > libpst
annotate contrib/FILE-FORMAT @ 407:24871e6cdd69
Stuart C. Naifeh - fix rfc2231 encoding when saving messages to both .eml and msg formats
author | Carl Byington <carl@five-ten-sg.com> |
---|---|
date | Sat, 27 Mar 2021 14:53:01 -0700 |
parents | f2742d1160a4 |
children |
rev | line source |
---|---|
16 | 1 File format for Outlook pst files |
2 ================================= | |
3 | |
4 Basically, we work on two indexes. One index associates an ID with each item, and the second index associates a second ID with the original ID. I see no real purpose for this yet. | |
5 | |
6 0x00 - Signature [4 bytes] (0x4E444221) | |
7 0xA8 - File Size [4 bytes] | |
8 0xC4 - Pointer to Index of all Items in File, associating the first ID [4 bytes] | |
9 0xBC - Pointer to Index of controlling Items in File [4 bytes] | |
10 | |
390
5c0ce43c7532
Fix a number of spelling mistakes
Paul Wise <pabs3@bonedaddy.net>
parents:
16
diff
changeset
|
11 First All Items Index: - consists of a table of offsets pointing to the table of items. |
16 | 12 ====================== |
13 repeating: | |
14 0x0 - First id in this table [4 bytes] | |
15 0x04 - Unknown [4 bytes] | |
16 0x08 - Offset of table [4 bytes] | |
17 | |
18 until "First id in this table" is zero | |
19 | |
20 Table Of Items: - Pointed to by above records. | |
21 =============== | |
22 repeating: | |
23 0x0 - Id1 of this item [4 bytes] | |
24 0x04 - Offset of this item [4 bytes] | |
25 0x08 - Size of data stored there [2 bytes] | |
26 0x0A - Unknown [2 bytes] | |
27 | |
28 until "Id1 of this item" is zero. When this is reached, you return to the above table and read the next record | |
29 | |
30 Second All Items Index: - Contains the descriptors for emails, and other items | |
31 ======================= | |
32 repeating: | |
33 0x0 - First id2 of this table [4 bytes] | |
34 0x04 - Unknown [4 bytes] | |
35 0x08 - Offset of table [4 bytes] | |
36 | |
37 until "First id2 of this table" is zero | |
38 | |
39 Second Table of Items: - Pointed to by above records | |
40 ====================== | |
41 repeats 0x1F times | |
42 0x0 - Id2 of this item [4 bytes] | |
43 0x04 - Id1 of the descriptor item [4 bytes] | |
44 0x08 - Id1 of the associated list [4 bytes] (this contains a list of id1 and id2 that are to with this controlling item) | |
45 0x0C - Id2 of parent [4 bytes] | |
46 | |
47 | |
48 Associated List: - pointed to by the above record. Contains associations between id1 and id2 for the items controlled by the record | |
49 ================ | |
50 0x0 - Constant [2 bytes] (0x0002) | |
51 0x02 - Count [2 bytes] (the number of items that are about to follow) | |
52 | |
53 repeating | |
54 0x0 - Id2 of record [4 bytes] | |
55 0x04 - Id of record [4 bytes] - This is an association between the two | |
56 0x08 - Unknown [4 bytes] | |
57 until you have reached the "Count" | |
58 | |
59 | |
60 Descriptor Item: - Referenced from "Second Table of Items" - contains information about the item (email, contact...) | |
61 ================ | |
62 0x0 - Block Offset to Block Index [2 bytes] | |
63 0x02 - Constant [2 bytes] (0xBCEC) | |
64 0x04 - Index Pos of Section1 [4 bytes] | |
65 | |
66 NOTE: An index pos can be shifted left 4 times [ i_pos << 4 ] to get an index offset (ie, an offset from the start of the block index) | |
67 | |
68 | |
69 Block Index: - contains offsets to points in the current block | |
70 ============ | |
71 0x0 - Count of offsets minus one. [2 bytes] (In effect, each offset must be taken with the following one so that the start and end of the referenced item can be established. Therefore there is one extra to show the end of the last item.) | |
72 | |
73 repeating | |
74 0x0 - Block Offset [2 bytes] | |
75 until you have one extra than the "Count" | |
76 | |
77 | |
78 Section1: - Referenced from "Descriptor Item" - contains not much | |
79 ========= | |
80 0x0 - Constant? [4 bytes] (0x0602B5) | |
81 0x04 - Index Pos of Descriptor fields [4 bytes] | |
82 | |
83 | |
84 Descriptor Fields: - Contain the information needed to access the details of the email | |
85 ================== | |
86 repeats | |
87 0x0 - Item type [2 bytes] (subject, from, to ...) | |
88 0x02 - Reference type [2 bytes] (how to interpret the value) | |
89 0x04 - Value [4 bytes] | |
390
5c0ce43c7532
Fix a number of spelling mistakes
Paul Wise <pabs3@bonedaddy.net>
parents:
16
diff
changeset
|
90 until the allotted size of the record has been read. (The following Block Offset from the Index has been reached) |
16 | 91 |
92 Reference Types: - I don't know if I have interpreted this field correctly. It might have a completely different purpose | |
93 =============== | |
94 0x0002 - | |
95 0x0003 - Value following is a value in it's own right | |
96 0x000B - | |
391
f2742d1160a4
Fix usage of indefinite articles
Paul Wise <pabs3@bonedaddy.net>
parents:
390
diff
changeset
|
97 0x001E - (STRING) Value following is an Index Position (must be shifted left 4 times) |
16 | 98 0x0040 - (DATE) " " " " " |
99 0x0048 - | |
100 0x0102 - (STRUCTURE) " " " " " | |
101 0x1003 - | |
102 0x101E - (ARRAY OF STRING) | |
103 0x1102 - | |
104 | |
105 Value: | |
106 ====== | |
391
f2742d1160a4
Fix usage of indefinite articles
Paul Wise <pabs3@bonedaddy.net>
parents:
390
diff
changeset
|
107 When the value is of type Index Position, you can left shift the value 4 times to get an offset into the Block Index. Some descriptor types can have Id2 values. This is recognised by using a bitwise AND with the number. ie val & 0x0000000F. if the result is 0xF, it is likely to be an Id2 reference. |
16 | 108 |
109 | |
110 Descriptor Types: - Types that are in "Descriptor Fields" | |
111 ================= | |
112 | |
113 All Values are Hex | |
114 | |
391
f2742d1160a4
Fix usage of indefinite articles
Paul Wise <pabs3@bonedaddy.net>
parents:
390
diff
changeset
|
115 Note: it appears that some types can have an IPOS value or an ID2 value depending on the size of the field in question. It is safer to check every field than for me to say what the "usually" contain. Absolute values though, are generally going to be constant. |
16 | 116 |
117 Type Ref Type Value Desc | |
118 ---- -------- ----- ---- | |
119 001A [REF] IPM Context. What type of message is this | |
120 0037 001E [REF] Email Subject. The referenced item is of type "Subject Type" | |
121 0039 [REF] Date. This is likely to be the arrival date | |
122 003B [REF] Outlook Address of Sender | |
123 003F [REF] Outlook structure describing the recipient | |
124 0040 [REF] Name of the Outlook recipient structure | |
125 0041 [REF] Outlook structure describing the sender | |
126 0042 [REF] Name of the Outlook sender structure | |
127 0043 [REF] Another structure describing the recipient | |
128 0044 [REF] Name of the second recipient structure | |
129 004F [REF] Reply-To Outlook Structure | |
130 0050 [REF] Name of the Reply-To structure | |
131 0051 [REF] Outlook Name of recipient | |
132 0052 [REF] Second Outlook name of recipient | |
133 0064 [REF] Sender's Address access method (SMTP, EX) | |
134 0065 [REF] Sender's Address | |
135 0070 [REF] Processed Subject (with Fwd:, Re, ... removed) | |
136 0071 [REF] Date. Another date | |
137 0075 [REF] Recipient Address Access Method (SMTP, EX) | |
138 0076 [REF] Recipient's Address | |
139 0077 [REF] Second Recipient Access Method (SMTP, EX) | |
140 0078 [REF] Second Recipient Address | |
141 007D 001E [REF] Email Header. This is the header that was attached to the email | |
142 0C19 [REF] Second sender struct | |
143 0C1A [REF] Name of second sender struct | |
144 0C1D [REF] Second outlook name of sender | |
145 0C1E [REF] Second sender access method (SMTP, EX) | |
146 0C1F [REF] Second Sender Address | |
147 0E03 [REF] CC Address? | |
148 0E04 [REF] SentTo Address | |
149 0E06 [REF] Date. | |
150 0E07 [REF] Flag - contains IsSeen value | |
151 0FF9 [REF] binary record header | |
152 1000 001E [REF] Plain Text Email Body. Does not exist if the email doesn't have a plain text version | |
391
f2742d1160a4
Fix usage of indefinite articles
Paul Wise <pabs3@bonedaddy.net>
parents:
390
diff
changeset
|
153 1013 001E [REF] HTML Email Body. Does not exist if the email doesn't have an HTML version |
16 | 154 1035 [REF] Message ID |
155 1042 [REF] In-Reply-To or Parent's Message ID | |
156 1046 [REF] Return Path | |
390
5c0ce43c7532
Fix a number of spelling mistakes
Paul Wise <pabs3@bonedaddy.net>
parents:
16
diff
changeset
|
157 3001 [REF] Folder Name? I have seen this value used for the contacts record as well |
16 | 158 3007 [REF] Date. |
159 3008 [REF] Date. | |
160 300B [REF] binary record header | |
161 35E0 [REF] binary record found in first item. Contains the reference to "Top of Personal Folder" item | |
162 35E3 [REF] binary record with a reference to "Deleted Items" item | |
390
5c0ce43c7532
Fix a number of spelling mistakes
Paul Wise <pabs3@bonedaddy.net>
parents:
16
diff
changeset
|
163 35E7 [REF] binary record with a reference to "Search Root" item |
16 | 164 3602 [REF] the number of emails stored in a folder |
165 3603 [REF] the number of unread emails in a folder | |
166 3613 [REF] the folder content description | |
167 8000- Contain extra bits of information that have been taken from the email's header. I call them extra lines | |
168 | |
169 Key: | |
170 [REF] = Can be either Index Position, or an Id2 Reference |