comparison xml/libpst.in @ 35:b2f247463b83 stable-0-5-6

better decoding of 7c blocks
author carl
date Sun, 15 Jul 2007 14:25:34 -0700
parents 12cac756bc05
children 6fe121a971c9
comparison
equal deleted inserted replaced
34:07177825c91b 35:b2f247463b83
650 are active. The nodeLevel is non-zero for this style of nodes. 650 are active. The nodeLevel is non-zero for this style of nodes.
651 The leaf nodes have a different format. The backPointer must 651 The leaf nodes have a different format. The backPointer must
652 match the backPointer from the triple that pointed to this node. 652 match the backPointer from the triple that pointed to this node.
653 </para> 653 </para>
654 <para> 654 <para>
655 Each item in this node is a triple of (ID, backPointer, offset) 655 Each item in this node is a triple of (ID1, backPointer, offset)
656 where the offset points to the next deeper node in the tree, the 656 where the offset points to the next deeper node in the tree, the
657 backPointer value must match the backPointer in that deeper node, 657 backPointer value must match the backPointer in that deeper node,
658 and ID is the lowest ID value in the subtree. 658 and ID1 is the lowest ID1 value in the subtree.
659 </para> 659 </para>
660 </refsect1> 660 </refsect1>
661 661
662 <refsect1 id='pst.file.leaf1.5'> 662 <refsect1 id='pst.file.leaf1.5'>
663 <title>Index 1 Leaf Node</title> 663 <title>Index 1 Leaf Node</title>
720 The backPointer must match the backPointer from the triple 720 The backPointer must match the backPointer from the triple
721 that pointed to this node. 721 that pointed to this node.
722 </para> 722 </para>
723 <para> 723 <para>
724 Each item in this node is a tuple of (ID1, offset, size, unknown) 724 Each item in this node is a tuple of (ID1, offset, size, unknown)
725 The two low order bits of the ID1 value seem to be flags. I have
726 never seen a case with bit zero set. Bit one indicates that the
727 item is <emphasis>not</emphasis> encrypted. Note that references
728 to these ID1 values elsewhere may have the low order bit set (and
729 I don't know what that means), but when we do the search in this
730 tree we need to clear that bit so that we can find the correct item.
725 </para> 731 </para>
726 </refsect1> 732 </refsect1>
727 733
728 <refsect1 id='pst.file.node2.5'> 734 <refsect1 id='pst.file.node2.5'>
729 <title>Index 2 Node</title> 735 <title>Index 2 Node</title>
903 0140 0c 00 14 00 7c 00 8c 00 93 00 ab 00 c3 00 db 00 909 0140 0c 00 14 00 7c 00 8c 00 93 00 ab 00 c3 00 db 00
904 0150 f3 00 0b 01 23 01 3b 01 910 0150 f3 00 0b 01 23 01 3b 01
905 911
906 0000 indexOffset [2 bytes] 0x013c in this case 912 0000 indexOffset [2 bytes] 0x013c in this case
907 0002 signature [2 bytes] 0xbcec constant 913 0002 signature [2 bytes] 0xbcec constant
908 0004 offset [2 bytes] 0x0020 in this case 914 0004 b5offset [4 bytes] 0x0020 index reference
909 ]]></literallayout> 915 ]]></literallayout>
910 <para> 916 <para>
911 Note the signature of 0xbcec. There are other descriptor block 917 Note the signature of 0xbcec. There are other descriptor block formats
912 formats with other signatures. 918 with other signatures. Note the indexOffset of 0x013c - starting at
913 Note the indexOffset of 0x013c - starting at that position in the 919 that position in the descriptor block, we have an array of two byte
914 descriptor block, we have an array of two byte integers. The first 920 integers. The first integer (0x000b) is a (count-1) of the number of
915 integer (0x000b) is a (count-1) of the number of overlapping pairs 921 overlapping pairs following the count. The first pair is (0, 0xc), the
916 following the count. The first pair is (0, 0xc), the next pair is (0xc, 0x14) 922 next pair is (0xc, 0x14) and the last (12th) pair is (0x123, 0x13b).
917 and the last (12th) pair is (0x123, 0x13b). These pairs are (start,end+1) 923 These pairs are (start,end+1) offsets of items in this block. So we
918 offsets of items in this block. So we have count+2 integers following 924 have count+2 integers following the count value.
919 the count value. 925 </para>
920 </para> 926 <para>
921 <para> 927 Note the b5offset of 0x0020, which is a type that I will call an index
922 Note the offset of 0x0020, which needs to be right shifted by 4 bits 928 reference. Such index references have at least two different forms, and
923 to become 0x0002, which is then a byte offset to be added to the above 929 may point to data either in this block, or in some other block.
924 indexOffset plus two (to skip the count), so it points to the (0xc, 0x14) 930 External pointer references have the low order 4 bits all set, and are
925 pair. Finally, we have the offset and size of the "b5" block located at offset 0xc 931 ID2 values that can be used to fetch data. This value of 0x0020 is an
932 internal pointer reference, which needs to be right shifted by 4 bits to
933 become 0x0002, which is then a byte offset to be added to the above
934 indexOffset plus two (to skip the count), so it points to the (0xc,
935 0x14) pair.
936 </para>
937 <para>
938 Finally, we have the offset and size of the "b5" block located at offset 0xc
926 with a size of 8 bytes in this descriptor block. The "b5" block has the 939 with a size of 8 bytes in this descriptor block. The "b5" block has the
927 following format: 940 following format:
928 </para> 941 </para>
929 <literallayout class="monospaced"><![CDATA[ 942 <literallayout class="monospaced"><![CDATA[
930 0000 signature [2 bytes] 0x02b5 constant 943 0000 signature [2 bytes] 0x02b5 constant
931 0002 unknown [2 bytes] 0x0006 in this case 944 0002 unknown [2 bytes] 0x0006 in this case
932 0004 offset [4 bytes] 0x0040 in this case 945 0004 descoffset [4 bytes] 0x0040 index reference
933 ]]></literallayout> 946 ]]></literallayout>
934 <para> 947 <para>
935 Note the "b5" offset of 0x0040, which needs to be right shifted by 4 bits 948 Note the descoffset of 0x0040, which again is an index reference. In this
949 case, it is an internal pointer reference, which needs to be right shifted by 4 bits
936 to become 0x0004, which is then a byte offset to be added to the above 950 to become 0x0004, which is then a byte offset to be added to the above
937 indexOffset plus two (to skip the count), so it points to the (0x14, 0x7c) 951 indexOffset plus two (to skip the count), so it points to the (0x14, 0x7c)
938 pair. We now have the offset 0x14 of the descriptor array, composed of 8 byte 952 pair. We now have the offset 0x14 of the descriptor array, composed of 8 byte
939 entries. Each descriptor entry has the following format: 953 entries. Each descriptor entry has the following format:
940 </para> 954 </para>
943 0002 referenceType [2 bytes] 957 0002 referenceType [2 bytes]
944 0004 value [4 bytes] 958 0004 value [4 bytes]
945 ]]></literallayout> 959 ]]></literallayout>
946 <para> 960 <para>
947 For some reference types (2, 3, 0xb) the value is used directly. Otherwise, 961 For some reference types (2, 3, 0xb) the value is used directly. Otherwise,
948 the value is generally a non-zero offset, to be right shifted by 4 bits and used to fetch 962 the value is an index reference, which is either an ID2 value, or an
949 a pair from the index table to find the offset and size of the item in this 963 offset, to be right shifted by 4 bits and used to fetch a pair from the
950 descriptor block. However, if (value AND 0xf) == 0xf, then the value is an ID2 index. 964 index table to find the offset and size of the item in this descriptor block.
951 </para> 965 </para>
952 <para> 966 <para>
953 The following reference types are known, but not all of these 967 The following reference types are known, but not all of these
954 are implemented in the code yet. 968 are implemented in the code yet.
955 </para> 969 </para>
1195 </refsect1> 1209 </refsect1>
1196 1210
1197 <refsect1 id='pst.file.desc2.5'> 1211 <refsect1 id='pst.file.desc2.5'>
1198 <title>Associated Descriptor Item 0x7cec</title> 1212 <title>Associated Descriptor Item 0x7cec</title>
1199 <para> 1213 <para>
1200 This style of descriptor block is similar to the BCEC format. 1214 This style of descriptor block is similar to the 0xbcec format.
1201 </para> 1215 </para>
1202 <literallayout class="monospaced"><![CDATA[ 1216 <literallayout class="monospaced"><![CDATA[
1203 0000 7a 01 ec 7c 40 00 00 00 00 00 00 00 b5 04 02 00 1217 0000 7a 01 ec 7c 40 00 00 00 00 00 00 00 b5 04 02 00
1204 0010 60 00 00 00 7c 18 60 00 60 00 62 00 65 00 20 00 1218 0010 60 00 00 00 7c 18 60 00 60 00 62 00 65 00 20 00
1205 0020 00 00 80 00 00 00 00 00 00 00 03 00 20 0e 0c 00 1219 0020 00 00 80 00 00 00 00 00 00 00 03 00 20 0e 0c 00
1226 0170 20 42 69 72 74 68 64 61 79 00 06 00 00 00 0c 00 1240 0170 20 42 69 72 74 68 64 61 79 00 06 00 00 00 0c 00
1227 0180 14 00 ea 00 f0 00 55 01 60 01 79 01 1241 0180 14 00 ea 00 f0 00 55 01 60 01 79 01
1228 1242
1229 0000 indexOffset [2 bytes] 0x017a in this case 1243 0000 indexOffset [2 bytes] 0x017a in this case
1230 0002 signature [2 bytes] 0x7cec constant 1244 0002 signature [2 bytes] 0x7cec constant
1231 0004 offset [2 bytes] 0x0040 in this case 1245 0004 7coffset [4 bytes] 0x0040 index reference
1232 ]]></literallayout> 1246 ]]></literallayout>
1233 <para> 1247 <para>
1234 Note the signature of 0x7cec. There are other descriptor block 1248 Note the signature of 0x7cec. There are other descriptor block
1235 formats with other signatures. 1249 formats with other signatures.
1236 Note the indexOffset of 0x017a - starting at that position in the 1250 Note the indexOffset of 0x017a - starting at that position in the
1240 and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1) 1254 and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1)
1241 offsets of items in this block. So we have count+2 integers following 1255 offsets of items in this block. So we have count+2 integers following
1242 the count value. 1256 the count value.
1243 </para> 1257 </para>
1244 <para> 1258 <para>
1245 Note the offset of 0x0040, which needs to be right shifted by 4 bits 1259 Note the 7coffset of 0x0040, which is an index reference. In this case,
1260 it is an internal reference pointer, which needs to be right shifted by 4 bits
1246 to become 0x0004, which is then a byte offset to be added to the above 1261 to become 0x0004, which is then a byte offset to be added to the above
1247 indexOffset plus two (to skip the count), so it points to the (0x14, 0xea) 1262 indexOffset plus two (to skip the count), so it points to the (0x14, 0xea)
1248 pair. We have the offset and size of the "7c" block located at offset 0x14 1263 pair. We have the offset and size of the "7c" block located at offset 0x14
1249 with a size of 214 bytes in this case. The "7c" block starts with 1264 with a size of 214 bytes in this case. The "7c" block starts with
1250 a header with the following format: 1265 a header with the following format:
1254 0001 itemCount [1 bytes] 0x18 in this case 1269 0001 itemCount [1 bytes] 0x18 in this case
1255 0002 unknown [2 bytes] 0x0060 in this case 1270 0002 unknown [2 bytes] 0x0060 in this case
1256 0004 unknown [2 bytes] 0x0060 in this case 1271 0004 unknown [2 bytes] 0x0060 in this case
1257 0006 unknown [2 bytes] 0x0062 in this case 1272 0006 unknown [2 bytes] 0x0062 in this case
1258 0008 recordSize [2 bytes] 0x0065 in this case 1273 0008 recordSize [2 bytes] 0x0065 in this case
1259 000a b5Offset [2 bytes] 0x0020 in this case 1274 000a b5Offset [4 bytes] 0x0020 index reference
1260 000c unknown [2 bytes] 0x0000 in this case 1275 000e index2Offset [4 bytes] 0x0080 index reference
1261 000e index2Offset [2 bytes] 0x0080 in this case
1262 0010 unknown [2 bytes] 0x0000 in this case 1276 0010 unknown [2 bytes] 0x0000 in this case
1263 0012 unknown [2 bytes] 0x0000 in this case 1277 0012 unknown [2 bytes] 0x0000 in this case
1264 0014 unknown [2 bytes] 0x0000 in this case 1278 0014 unknown [2 bytes] 0x0000 in this case
1265 ]]></literallayout> 1279 ]]></literallayout>
1266 <para> 1280 <para>
1267 Note the b5Offset of 0x0020, which needs to be right shifted by 4 bits 1281 Note the b5Offset of 0x0020, which is an index reference. In this case,
1282 it is an internal reference pointer, which needs to be right shifted by 4 bits
1268 to become 0x0002, which is then a byte offset to be added to the above 1283 to become 0x0002, which is then a byte offset to be added to the above
1269 indexOffset plus two (to skip the count), so it points to the (0xc, 1284 indexOffset plus two (to skip the count), so it points to the (0xc,
1270 0x14) pair. Finally, we have the offset and size of the "b5" block 1285 0x14) pair. Finally, we have the offset and size of the "b5" block
1271 located at offset 0xc with a size of 8 bytes in this descriptor block. 1286 located at offset 0xc with a size of 8 bytes in this descriptor block.
1272 The "b5" block has the following format: 1287 The "b5" block has the following format:
1273 </para> 1288 </para>
1274 <literallayout class="monospaced"><![CDATA[ 1289 <literallayout class="monospaced"><![CDATA[
1275 0000 signature [2 bytes] 0x04b5 constant 1290 0000 signature [2 bytes] 0x04b5 constant
1276 0002 unknown [2 bytes] 0x0002 in this case 1291 0002 unknown [2 bytes] 0x0002 in this case
1277 0004 offset [4 bytes] 0x0060 in this case 1292 0004 descoffset [4 bytes] 0x0060 index reference
1278 ]]></literallayout> 1293 ]]></literallayout>
1279 <para> 1294 <para>
1280 Note the "b5" offset of 0x0060, which needs to be right shifted by 4 1295 Note the descoffset of 0x0060, which again is an index reference. In this
1296 case, it is an internal pointer reference, which needs to be right shifted by 4
1281 bits to become 0x0006, which is then a byte offset to be added to the 1297 bits to become 0x0006, which is then a byte offset to be added to the
1282 above indexOffset plus two (to skip the count), so it points to the 1298 above indexOffset plus two (to skip the count), so it points to the
1283 (0xea, 0xf0) pair. That gives us (0xf0 - 0xea)/6 = 1, so we have a 1299 (0xea, 0xf0) pair. That gives us (0xf0 - 0xea)/6 = 1, so we have a
1284 recordCount of one. The actual data between 0xea and 0xf0 is unknown 1300 recordCount of one. The actual data between 0xea and 0xf0 is unknown
1285 and unused here. 1301 and unused here.
1286 </para> 1302 </para>
1287 <para> 1303 <para>
1288 Note the index2Offset above of 0x0080, which needs to be right shifted 1304 Note the index2Offset above of 0x0080, which again is an index reference. In this
1305 case, it is an internal pointer reference, which needs to be right shifted
1289 by 4 bits to become 0x0008, which is then a byte offset to be added to 1306 by 4 bits to become 0x0008, which is then a byte offset to be added to
1290 the above indexOffset plus two (to skip the count), so it points to the 1307 the above indexOffset plus two (to skip the count), so it points to the
1291 (0xf0, 0x155) pair. This is an array of tables of four byte integers. 1308 (0xf0, 0x155) pair. This is an array of tables of four byte integers.
1292 We will call these the IND2 tables. The size of each of these tables is 1309 We will call these the IND2 tables. The size of each of these tables is
1293 specified by the recordSize field of the "7c" header. The number of 1310 specified by the recordSize field of the "7c" header. The number of
1300 </para> 1317 </para>
1301 <literallayout class="monospaced"><![CDATA[ 1318 <literallayout class="monospaced"><![CDATA[
1302 0000 referenceType [2 bytes] 1319 0000 referenceType [2 bytes]
1303 0002 itemType [2 bytes] 1320 0002 itemType [2 bytes]
1304 0004 ind2Offset [2 bytes] 1321 0004 ind2Offset [2 bytes]
1305 0006 unknown [2 bytes] 1322 0006 size [1 byte]
1306 ]]></literallayout> 1323 0007 unknown [1 byte]
1307 <para> 1324 ]]></literallayout>
1308 The ind2Offset is a byte offset into the current IND2 table of a four 1325 <para>
1309 byte integer value. Once we fetch that, we have the same triple (item 1326 The ind2Offset is a byte offset into the current IND2 table of some value.
1310 type, reference type, value) as we find in the 0xbcec style descriptor 1327 If that is a four byte integer value, then once we fetch that, we have
1311 blocks. These 8 byte descriptors are processed recordCount times, each 1328 the same triple (item type, reference type, value) as we find in the
1329 0xbcec style descriptor blocks. If not, then this value is used directly.
1330 These 8 byte descriptors are processed recordCount times, each
1312 time using the next IND2 table. The item and reference types are as 1331 time using the next IND2 table. The item and reference types are as
1313 described above for the 0xbcec format descriptor block. 1332 described above for the 0xbcec format descriptor block.
1314 </para> 1333 </para>
1315 </refsect1> 1334 </refsect1>
1316 1335
1336 <refsect1 id='pst.file.desc3.5'>
1337 <title>Associated Descriptor Item 0x0002</title>
1338 <para>
1339 This style of descriptor block is almost unknown here.
1340 It seems to contain a list of ID1 values.
1341 </para>
1342 <literallayout class="monospaced"><![CDATA[
1343 0000 01 01 02 00 26 28 00 00 18 77 0c 00 b8 04 00 00
1344
1345 0000 signature [2 bytes] 0x0101 constant
1346 0002 count [2 bytes] 0x0002 in this case
1347 0004 unknown [4 bytes] 0x002826 in this case
1348 repeating
1349 0008 id [4 bytes] 0x0c7718 in this case
1350 000c id [4 bytes] 0x0004b8 in this case
1351 ]]></literallayout>
1352 </refsect1>
1353
1317 </refentry> 1354 </refentry>
1318 </reference> 1355 </reference>