Mercurial > libpst
comparison xml/libpst.in @ 35:b2f247463b83 stable-0-5-6
better decoding of 7c blocks
author | carl |
---|---|
date | Sun, 15 Jul 2007 14:25:34 -0700 |
parents | 12cac756bc05 |
children | 6fe121a971c9 |
comparison
equal
deleted
inserted
replaced
34:07177825c91b | 35:b2f247463b83 |
---|---|
650 are active. The nodeLevel is non-zero for this style of nodes. | 650 are active. The nodeLevel is non-zero for this style of nodes. |
651 The leaf nodes have a different format. The backPointer must | 651 The leaf nodes have a different format. The backPointer must |
652 match the backPointer from the triple that pointed to this node. | 652 match the backPointer from the triple that pointed to this node. |
653 </para> | 653 </para> |
654 <para> | 654 <para> |
655 Each item in this node is a triple of (ID, backPointer, offset) | 655 Each item in this node is a triple of (ID1, backPointer, offset) |
656 where the offset points to the next deeper node in the tree, the | 656 where the offset points to the next deeper node in the tree, the |
657 backPointer value must match the backPointer in that deeper node, | 657 backPointer value must match the backPointer in that deeper node, |
658 and ID is the lowest ID value in the subtree. | 658 and ID1 is the lowest ID1 value in the subtree. |
659 </para> | 659 </para> |
660 </refsect1> | 660 </refsect1> |
661 | 661 |
662 <refsect1 id='pst.file.leaf1.5'> | 662 <refsect1 id='pst.file.leaf1.5'> |
663 <title>Index 1 Leaf Node</title> | 663 <title>Index 1 Leaf Node</title> |
720 The backPointer must match the backPointer from the triple | 720 The backPointer must match the backPointer from the triple |
721 that pointed to this node. | 721 that pointed to this node. |
722 </para> | 722 </para> |
723 <para> | 723 <para> |
724 Each item in this node is a tuple of (ID1, offset, size, unknown) | 724 Each item in this node is a tuple of (ID1, offset, size, unknown) |
725 The two low order bits of the ID1 value seem to be flags. I have | |
726 never seen a case with bit zero set. Bit one indicates that the | |
727 item is <emphasis>not</emphasis> encrypted. Note that references | |
728 to these ID1 values elsewhere may have the low order bit set (and | |
729 I don't know what that means), but when we do the search in this | |
730 tree we need to clear that bit so that we can find the correct item. | |
725 </para> | 731 </para> |
726 </refsect1> | 732 </refsect1> |
727 | 733 |
728 <refsect1 id='pst.file.node2.5'> | 734 <refsect1 id='pst.file.node2.5'> |
729 <title>Index 2 Node</title> | 735 <title>Index 2 Node</title> |
903 0140 0c 00 14 00 7c 00 8c 00 93 00 ab 00 c3 00 db 00 | 909 0140 0c 00 14 00 7c 00 8c 00 93 00 ab 00 c3 00 db 00 |
904 0150 f3 00 0b 01 23 01 3b 01 | 910 0150 f3 00 0b 01 23 01 3b 01 |
905 | 911 |
906 0000 indexOffset [2 bytes] 0x013c in this case | 912 0000 indexOffset [2 bytes] 0x013c in this case |
907 0002 signature [2 bytes] 0xbcec constant | 913 0002 signature [2 bytes] 0xbcec constant |
908 0004 offset [2 bytes] 0x0020 in this case | 914 0004 b5offset [4 bytes] 0x0020 index reference |
909 ]]></literallayout> | 915 ]]></literallayout> |
910 <para> | 916 <para> |
911 Note the signature of 0xbcec. There are other descriptor block | 917 Note the signature of 0xbcec. There are other descriptor block formats |
912 formats with other signatures. | 918 with other signatures. Note the indexOffset of 0x013c - starting at |
913 Note the indexOffset of 0x013c - starting at that position in the | 919 that position in the descriptor block, we have an array of two byte |
914 descriptor block, we have an array of two byte integers. The first | 920 integers. The first integer (0x000b) is a (count-1) of the number of |
915 integer (0x000b) is a (count-1) of the number of overlapping pairs | 921 overlapping pairs following the count. The first pair is (0, 0xc), the |
916 following the count. The first pair is (0, 0xc), the next pair is (0xc, 0x14) | 922 next pair is (0xc, 0x14) and the last (12th) pair is (0x123, 0x13b). |
917 and the last (12th) pair is (0x123, 0x13b). These pairs are (start,end+1) | 923 These pairs are (start,end+1) offsets of items in this block. So we |
918 offsets of items in this block. So we have count+2 integers following | 924 have count+2 integers following the count value. |
919 the count value. | 925 </para> |
920 </para> | 926 <para> |
921 <para> | 927 Note the b5offset of 0x0020, which is a type that I will call an index |
922 Note the offset of 0x0020, which needs to be right shifted by 4 bits | 928 reference. Such index references have at least two different forms, and |
923 to become 0x0002, which is then a byte offset to be added to the above | 929 may point to data either in this block, or in some other block. |
924 indexOffset plus two (to skip the count), so it points to the (0xc, 0x14) | 930 External pointer references have the low order 4 bits all set, and are |
925 pair. Finally, we have the offset and size of the "b5" block located at offset 0xc | 931 ID2 values that can be used to fetch data. This value of 0x0020 is an |
932 internal pointer reference, which needs to be right shifted by 4 bits to | |
933 become 0x0002, which is then a byte offset to be added to the above | |
934 indexOffset plus two (to skip the count), so it points to the (0xc, | |
935 0x14) pair. | |
936 </para> | |
937 <para> | |
938 Finally, we have the offset and size of the "b5" block located at offset 0xc | |
926 with a size of 8 bytes in this descriptor block. The "b5" block has the | 939 with a size of 8 bytes in this descriptor block. The "b5" block has the |
927 following format: | 940 following format: |
928 </para> | 941 </para> |
929 <literallayout class="monospaced"><![CDATA[ | 942 <literallayout class="monospaced"><![CDATA[ |
930 0000 signature [2 bytes] 0x02b5 constant | 943 0000 signature [2 bytes] 0x02b5 constant |
931 0002 unknown [2 bytes] 0x0006 in this case | 944 0002 unknown [2 bytes] 0x0006 in this case |
932 0004 offset [4 bytes] 0x0040 in this case | 945 0004 descoffset [4 bytes] 0x0040 index reference |
933 ]]></literallayout> | 946 ]]></literallayout> |
934 <para> | 947 <para> |
935 Note the "b5" offset of 0x0040, which needs to be right shifted by 4 bits | 948 Note the descoffset of 0x0040, which again is an index reference. In this |
949 case, it is an internal pointer reference, which needs to be right shifted by 4 bits | |
936 to become 0x0004, which is then a byte offset to be added to the above | 950 to become 0x0004, which is then a byte offset to be added to the above |
937 indexOffset plus two (to skip the count), so it points to the (0x14, 0x7c) | 951 indexOffset plus two (to skip the count), so it points to the (0x14, 0x7c) |
938 pair. We now have the offset 0x14 of the descriptor array, composed of 8 byte | 952 pair. We now have the offset 0x14 of the descriptor array, composed of 8 byte |
939 entries. Each descriptor entry has the following format: | 953 entries. Each descriptor entry has the following format: |
940 </para> | 954 </para> |
943 0002 referenceType [2 bytes] | 957 0002 referenceType [2 bytes] |
944 0004 value [4 bytes] | 958 0004 value [4 bytes] |
945 ]]></literallayout> | 959 ]]></literallayout> |
946 <para> | 960 <para> |
947 For some reference types (2, 3, 0xb) the value is used directly. Otherwise, | 961 For some reference types (2, 3, 0xb) the value is used directly. Otherwise, |
948 the value is generally a non-zero offset, to be right shifted by 4 bits and used to fetch | 962 the value is an index reference, which is either an ID2 value, or an |
949 a pair from the index table to find the offset and size of the item in this | 963 offset, to be right shifted by 4 bits and used to fetch a pair from the |
950 descriptor block. However, if (value AND 0xf) == 0xf, then the value is an ID2 index. | 964 index table to find the offset and size of the item in this descriptor block. |
951 </para> | 965 </para> |
952 <para> | 966 <para> |
953 The following reference types are known, but not all of these | 967 The following reference types are known, but not all of these |
954 are implemented in the code yet. | 968 are implemented in the code yet. |
955 </para> | 969 </para> |
1195 </refsect1> | 1209 </refsect1> |
1196 | 1210 |
1197 <refsect1 id='pst.file.desc2.5'> | 1211 <refsect1 id='pst.file.desc2.5'> |
1198 <title>Associated Descriptor Item 0x7cec</title> | 1212 <title>Associated Descriptor Item 0x7cec</title> |
1199 <para> | 1213 <para> |
1200 This style of descriptor block is similar to the BCEC format. | 1214 This style of descriptor block is similar to the 0xbcec format. |
1201 </para> | 1215 </para> |
1202 <literallayout class="monospaced"><![CDATA[ | 1216 <literallayout class="monospaced"><![CDATA[ |
1203 0000 7a 01 ec 7c 40 00 00 00 00 00 00 00 b5 04 02 00 | 1217 0000 7a 01 ec 7c 40 00 00 00 00 00 00 00 b5 04 02 00 |
1204 0010 60 00 00 00 7c 18 60 00 60 00 62 00 65 00 20 00 | 1218 0010 60 00 00 00 7c 18 60 00 60 00 62 00 65 00 20 00 |
1205 0020 00 00 80 00 00 00 00 00 00 00 03 00 20 0e 0c 00 | 1219 0020 00 00 80 00 00 00 00 00 00 00 03 00 20 0e 0c 00 |
1226 0170 20 42 69 72 74 68 64 61 79 00 06 00 00 00 0c 00 | 1240 0170 20 42 69 72 74 68 64 61 79 00 06 00 00 00 0c 00 |
1227 0180 14 00 ea 00 f0 00 55 01 60 01 79 01 | 1241 0180 14 00 ea 00 f0 00 55 01 60 01 79 01 |
1228 | 1242 |
1229 0000 indexOffset [2 bytes] 0x017a in this case | 1243 0000 indexOffset [2 bytes] 0x017a in this case |
1230 0002 signature [2 bytes] 0x7cec constant | 1244 0002 signature [2 bytes] 0x7cec constant |
1231 0004 offset [2 bytes] 0x0040 in this case | 1245 0004 7coffset [4 bytes] 0x0040 index reference |
1232 ]]></literallayout> | 1246 ]]></literallayout> |
1233 <para> | 1247 <para> |
1234 Note the signature of 0x7cec. There are other descriptor block | 1248 Note the signature of 0x7cec. There are other descriptor block |
1235 formats with other signatures. | 1249 formats with other signatures. |
1236 Note the indexOffset of 0x017a - starting at that position in the | 1250 Note the indexOffset of 0x017a - starting at that position in the |
1240 and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1) | 1254 and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1) |
1241 offsets of items in this block. So we have count+2 integers following | 1255 offsets of items in this block. So we have count+2 integers following |
1242 the count value. | 1256 the count value. |
1243 </para> | 1257 </para> |
1244 <para> | 1258 <para> |
1245 Note the offset of 0x0040, which needs to be right shifted by 4 bits | 1259 Note the 7coffset of 0x0040, which is an index reference. In this case, |
1260 it is an internal reference pointer, which needs to be right shifted by 4 bits | |
1246 to become 0x0004, which is then a byte offset to be added to the above | 1261 to become 0x0004, which is then a byte offset to be added to the above |
1247 indexOffset plus two (to skip the count), so it points to the (0x14, 0xea) | 1262 indexOffset plus two (to skip the count), so it points to the (0x14, 0xea) |
1248 pair. We have the offset and size of the "7c" block located at offset 0x14 | 1263 pair. We have the offset and size of the "7c" block located at offset 0x14 |
1249 with a size of 214 bytes in this case. The "7c" block starts with | 1264 with a size of 214 bytes in this case. The "7c" block starts with |
1250 a header with the following format: | 1265 a header with the following format: |
1254 0001 itemCount [1 bytes] 0x18 in this case | 1269 0001 itemCount [1 bytes] 0x18 in this case |
1255 0002 unknown [2 bytes] 0x0060 in this case | 1270 0002 unknown [2 bytes] 0x0060 in this case |
1256 0004 unknown [2 bytes] 0x0060 in this case | 1271 0004 unknown [2 bytes] 0x0060 in this case |
1257 0006 unknown [2 bytes] 0x0062 in this case | 1272 0006 unknown [2 bytes] 0x0062 in this case |
1258 0008 recordSize [2 bytes] 0x0065 in this case | 1273 0008 recordSize [2 bytes] 0x0065 in this case |
1259 000a b5Offset [2 bytes] 0x0020 in this case | 1274 000a b5Offset [4 bytes] 0x0020 index reference |
1260 000c unknown [2 bytes] 0x0000 in this case | 1275 000e index2Offset [4 bytes] 0x0080 index reference |
1261 000e index2Offset [2 bytes] 0x0080 in this case | |
1262 0010 unknown [2 bytes] 0x0000 in this case | 1276 0010 unknown [2 bytes] 0x0000 in this case |
1263 0012 unknown [2 bytes] 0x0000 in this case | 1277 0012 unknown [2 bytes] 0x0000 in this case |
1264 0014 unknown [2 bytes] 0x0000 in this case | 1278 0014 unknown [2 bytes] 0x0000 in this case |
1265 ]]></literallayout> | 1279 ]]></literallayout> |
1266 <para> | 1280 <para> |
1267 Note the b5Offset of 0x0020, which needs to be right shifted by 4 bits | 1281 Note the b5Offset of 0x0020, which is an index reference. In this case, |
1282 it is an internal reference pointer, which needs to be right shifted by 4 bits | |
1268 to become 0x0002, which is then a byte offset to be added to the above | 1283 to become 0x0002, which is then a byte offset to be added to the above |
1269 indexOffset plus two (to skip the count), so it points to the (0xc, | 1284 indexOffset plus two (to skip the count), so it points to the (0xc, |
1270 0x14) pair. Finally, we have the offset and size of the "b5" block | 1285 0x14) pair. Finally, we have the offset and size of the "b5" block |
1271 located at offset 0xc with a size of 8 bytes in this descriptor block. | 1286 located at offset 0xc with a size of 8 bytes in this descriptor block. |
1272 The "b5" block has the following format: | 1287 The "b5" block has the following format: |
1273 </para> | 1288 </para> |
1274 <literallayout class="monospaced"><![CDATA[ | 1289 <literallayout class="monospaced"><![CDATA[ |
1275 0000 signature [2 bytes] 0x04b5 constant | 1290 0000 signature [2 bytes] 0x04b5 constant |
1276 0002 unknown [2 bytes] 0x0002 in this case | 1291 0002 unknown [2 bytes] 0x0002 in this case |
1277 0004 offset [4 bytes] 0x0060 in this case | 1292 0004 descoffset [4 bytes] 0x0060 index reference |
1278 ]]></literallayout> | 1293 ]]></literallayout> |
1279 <para> | 1294 <para> |
1280 Note the "b5" offset of 0x0060, which needs to be right shifted by 4 | 1295 Note the descoffset of 0x0060, which again is an index reference. In this |
1296 case, it is an internal pointer reference, which needs to be right shifted by 4 | |
1281 bits to become 0x0006, which is then a byte offset to be added to the | 1297 bits to become 0x0006, which is then a byte offset to be added to the |
1282 above indexOffset plus two (to skip the count), so it points to the | 1298 above indexOffset plus two (to skip the count), so it points to the |
1283 (0xea, 0xf0) pair. That gives us (0xf0 - 0xea)/6 = 1, so we have a | 1299 (0xea, 0xf0) pair. That gives us (0xf0 - 0xea)/6 = 1, so we have a |
1284 recordCount of one. The actual data between 0xea and 0xf0 is unknown | 1300 recordCount of one. The actual data between 0xea and 0xf0 is unknown |
1285 and unused here. | 1301 and unused here. |
1286 </para> | 1302 </para> |
1287 <para> | 1303 <para> |
1288 Note the index2Offset above of 0x0080, which needs to be right shifted | 1304 Note the index2Offset above of 0x0080, which again is an index reference. In this |
1305 case, it is an internal pointer reference, which needs to be right shifted | |
1289 by 4 bits to become 0x0008, which is then a byte offset to be added to | 1306 by 4 bits to become 0x0008, which is then a byte offset to be added to |
1290 the above indexOffset plus two (to skip the count), so it points to the | 1307 the above indexOffset plus two (to skip the count), so it points to the |
1291 (0xf0, 0x155) pair. This is an array of tables of four byte integers. | 1308 (0xf0, 0x155) pair. This is an array of tables of four byte integers. |
1292 We will call these the IND2 tables. The size of each of these tables is | 1309 We will call these the IND2 tables. The size of each of these tables is |
1293 specified by the recordSize field of the "7c" header. The number of | 1310 specified by the recordSize field of the "7c" header. The number of |
1300 </para> | 1317 </para> |
1301 <literallayout class="monospaced"><![CDATA[ | 1318 <literallayout class="monospaced"><![CDATA[ |
1302 0000 referenceType [2 bytes] | 1319 0000 referenceType [2 bytes] |
1303 0002 itemType [2 bytes] | 1320 0002 itemType [2 bytes] |
1304 0004 ind2Offset [2 bytes] | 1321 0004 ind2Offset [2 bytes] |
1305 0006 unknown [2 bytes] | 1322 0006 size [1 byte] |
1306 ]]></literallayout> | 1323 0007 unknown [1 byte] |
1307 <para> | 1324 ]]></literallayout> |
1308 The ind2Offset is a byte offset into the current IND2 table of a four | 1325 <para> |
1309 byte integer value. Once we fetch that, we have the same triple (item | 1326 The ind2Offset is a byte offset into the current IND2 table of some value. |
1310 type, reference type, value) as we find in the 0xbcec style descriptor | 1327 If that is a four byte integer value, then once we fetch that, we have |
1311 blocks. These 8 byte descriptors are processed recordCount times, each | 1328 the same triple (item type, reference type, value) as we find in the |
1329 0xbcec style descriptor blocks. If not, then this value is used directly. | |
1330 These 8 byte descriptors are processed recordCount times, each | |
1312 time using the next IND2 table. The item and reference types are as | 1331 time using the next IND2 table. The item and reference types are as |
1313 described above for the 0xbcec format descriptor block. | 1332 described above for the 0xbcec format descriptor block. |
1314 </para> | 1333 </para> |
1315 </refsect1> | 1334 </refsect1> |
1316 | 1335 |
1336 <refsect1 id='pst.file.desc3.5'> | |
1337 <title>Associated Descriptor Item 0x0002</title> | |
1338 <para> | |
1339 This style of descriptor block is almost unknown here. | |
1340 It seems to contain a list of ID1 values. | |
1341 </para> | |
1342 <literallayout class="monospaced"><![CDATA[ | |
1343 0000 01 01 02 00 26 28 00 00 18 77 0c 00 b8 04 00 00 | |
1344 | |
1345 0000 signature [2 bytes] 0x0101 constant | |
1346 0002 count [2 bytes] 0x0002 in this case | |
1347 0004 unknown [4 bytes] 0x002826 in this case | |
1348 repeating | |
1349 0008 id [4 bytes] 0x0c7718 in this case | |
1350 000c id [4 bytes] 0x0004b8 in this case | |
1351 ]]></literallayout> | |
1352 </refsect1> | |
1353 | |
1317 </refentry> | 1354 </refentry> |
1318 </reference> | 1355 </reference> |