# HG changeset patch # User Carl Byington # Date 1223225792 25200 # Node ID 1fc33da2317574882408d35d2fb9e19d1bce38e3 # Parent 1e4a7610d525a596275522dab7e74d8a1c521a3a fix for orphan children when building descriptor tree, avoid writing uninitialized data to debug log file diff -r 1e4a7610d525 -r 1fc33da23175 ChangeLog --- a/ChangeLog Thu Oct 02 15:29:36 2008 -0700 +++ b/ChangeLog Sun Oct 05 09:56:32 2008 -0700 @@ -1,10 +1,12 @@ -LibPST 0.6.20 (2008-10-02) +LibPST 0.6.20 (2008-10-04) =============================== * add configure option --enable-dii=no to remove dependency on libgd. * many fixes in pst2ldif by Robert Harris. * add -D option to include deleted items, from Justin Greer * fix from Justin Greer to add missing email headers * fix from Justin Greer for my_stristr() + * fix for orphan children when building descriptor tree + * avoid writing uninitialized data to debug log file LibPST 0.6.19 (2008-09-14) =============================== diff -r 1e4a7610d525 -r 1fc33da23175 NEWS --- a/NEWS Thu Oct 02 15:29:36 2008 -0700 +++ b/NEWS Sun Oct 05 09:56:32 2008 -0700 @@ -1,4 +1,4 @@ -0.6.20 2008-10-02 add configure option --enable-dii=no, fixes from Robert Harris for pst2ldif. +0.6.20 2008-10-04 add configure option --enable-dii=no, fixes from Robert Harris for pst2ldif. 0.6.19 2008-09-14 Initial work on a .so shared library from Bharath Acharya. 0.6.18 2008-08-28 Fixes for iconv on Mac from Justin Greer. 0.6.17 2008-08-05 More fixes for 32/64 bit portability on big endian ppc. diff -r 1e4a7610d525 -r 1fc33da23175 libpst.spec.in --- a/libpst.spec.in Thu Oct 02 15:29:36 2008 -0700 +++ b/libpst.spec.in Sun Oct 05 09:56:32 2008 -0700 @@ -47,12 +47,14 @@ %changelog -* Thu Oct 02 2008 Carl Byington - 0.6.20-1 +* Sat Oct 04 2008 Carl Byington - 0.6.20-1 - add configure option --enable-dii=no to remove dependency on libgd. - many fixes in pst2ldif by Robert Harris. - add -D option to include deleted items, from Justin Greer - fix from Justin Greer to add missing email headers - fix from Justin Greer for my_stristr() +- fix for orphan children when building descriptor tree +- avoid writing uninitialized data to debug log file * Sun Sep 14 2008 Carl Byington - 0.6.19-1 - Fix base64 encoding that could create long lines. diff -r 1e4a7610d525 -r 1fc33da23175 regression/regression-tests.bash --- a/regression/regression-tests.bash Thu Oct 02 15:29:36 2008 -0700 +++ b/regression/regression-tests.bash Sun Oct 05 09:56:32 2008 -0700 @@ -28,7 +28,7 @@ val="valgrind --leak-check=full" -val='' +#val='' pushd .. make || exit diff -r 1e4a7610d525 -r 1fc33da23175 src/debug.c --- a/src/debug.c Thu Oct 02 15:29:36 2008 -0700 +++ b/src/debug.c Sun Oct 05 09:56:32 2008 -0700 @@ -98,7 +98,7 @@ fprintf(stderr, "Opening of file %s failed\n", fname); exit(1); } - pst_debug_fwrite(&version, 1, sizeof(char), debug_fp); + pst_debug_fwrite(&version, sizeof(char), 1, debug_fp); } @@ -191,9 +191,8 @@ info_ptr->text = "ERROR Saving\n"; } - if (!item_head) - item_head = info_ptr; - + // add to the linked list of pending items + if (!item_head) item_head = info_ptr; info_ptr->next = NULL; if (item_tail) item_tail->next = info_ptr; item_tail = info_ptr; @@ -300,26 +299,26 @@ end=ptr; if (end > USHRT_MAX) { // bigger than can be stored in a short rec_type = 'L'; - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); - lfile_rec.type = item_ptr->type; - lfile_rec.line = item_ptr->line; + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); + lfile_rec.type = item_ptr->type; + lfile_rec.line = item_ptr->line; lfile_rec.funcname = funcname; lfile_rec.filename = filename; - lfile_rec.text = text; - lfile_rec.end = end; + lfile_rec.text = text; + lfile_rec.end = end; pst_debug_fwrite(&lfile_rec, sizeof(lfile_rec), 1, debug_fp); } else { rec_type = 'M'; - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); - mfile_rec.type = item_ptr->type; - mfile_rec.line = item_ptr->line; + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); + mfile_rec.type = item_ptr->type; + mfile_rec.line = item_ptr->line; mfile_rec.funcname = funcname; mfile_rec.filename = filename; - mfile_rec.text = text; - mfile_rec.end = end; + mfile_rec.text = text; + mfile_rec.end = end; pst_debug_fwrite(&mfile_rec, sizeof(mfile_rec), 1, debug_fp); } - pst_debug_fwrite(buf, 1, ptr, debug_fp); + pst_debug_fwrite(buf, ptr, 1, debug_fp); if (buf) free(buf); buf = NULL; item_head = item_ptr->next; free(item_ptr->function); @@ -348,10 +347,12 @@ int index_size = 3 * sizeof(off_t); off_t index[3]; off_t index_pos, file_pos; - char zero='\0'; + char zero = '\0'; unsigned int end; if (!debug_fp) return; // no file - index[0] = 1; //only one item in this index + index[0] = 1; // only one item in this index + index[1] = 0; // valgrind, avoid writing uninitialized data + index[2] = 0; // "" index_pos = ftello(debug_fp); pst_debug_fwrite(index, index_size, 1, debug_fp); @@ -359,21 +360,23 @@ if (size > USHRT_MAX) { // bigger than can be stored in a short rec_type = 'L'; - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); - lfile_rec.type = item->type; - lfile_rec.line = item->line; + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); + lfile_rec.type = item->type; + lfile_rec.line = item->line; lfile_rec.funcname = 0; lfile_rec.filename = strlen(item->function)+1; - lfile_rec.text = lfile_rec.filename+strlen(item->file)+1; + lfile_rec.text = lfile_rec.filename+strlen(item->file)+1; + lfile_rec.end = 0; // valgrind, avoid writing uninitialized data pst_debug_fwrite(&lfile_rec, sizeof(lfile_rec), 1, debug_fp); } else { rec_type = 'M'; - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); - mfile_rec.type = item->type; - mfile_rec.line = item->line; + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); + mfile_rec.type = item->type; + mfile_rec.line = item->line; mfile_rec.funcname = 0; mfile_rec.filename = strlen(item->function)+1; - mfile_rec.text = mfile_rec.filename+strlen(item->file)+1; + mfile_rec.text = mfile_rec.filename+strlen(item->file)+1; + mfile_rec.end = 0; // valgrind, avoid writing uninitialized data pst_debug_fwrite(&mfile_rec, sizeof(mfile_rec), 1, debug_fp); } file_pos = ftello(debug_fp); @@ -388,11 +391,11 @@ fseeko(debug_fp, index_pos, SEEK_SET); pst_debug_fwrite(index, index_size, 1, debug_fp); if (size > USHRT_MAX) { - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); lfile_rec.end = end; pst_debug_fwrite(&lfile_rec, sizeof(lfile_rec), 1, debug_fp); } else { - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); mfile_rec.end = end; pst_debug_fwrite(&mfile_rec, sizeof(mfile_rec), 1, debug_fp); } @@ -416,7 +419,7 @@ // always use the long rec_type = 'L'; - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); lfile_rec.funcname = 0; lfile_rec.filename = strlen(item->function)+1; lfile_rec.text = lfile_rec.filename+strlen(item->file)+1; @@ -436,7 +439,7 @@ index[2] = ftello(debug_fp); fseeko(debug_fp, index_pos, SEEK_SET); pst_debug_fwrite(index, index_size, 1, debug_fp); - pst_debug_fwrite(&rec_type, 1, sizeof(char), debug_fp); + pst_debug_fwrite(&rec_type, sizeof(char), 1, debug_fp); pst_debug_fwrite(&lfile_rec, sizeof(lfile_rec), 1, debug_fp); fseeko(debug_fp, 0, SEEK_END); } diff -r 1e4a7610d525 -r 1fc33da23175 src/libpst.c --- a/src/libpst.c Thu Oct 02 15:29:36 2008 -0700 +++ b/src/libpst.c Sun Oct 05 09:56:32 2008 -0700 @@ -821,108 +821,88 @@ } -/** this list node type is used for a quick cache - of the descriptor tree nodes (rooted at pf->d_head) - and for a "lost and found" list. - If the parent isn't found yet, put it on the lost and found - list and check it each time you read a new item. -*/ -struct cache_list_node { - pst_desc_ll *ptr; - /** only used for lost and found lists */ - uint64_t parent; - struct cache_list_node *next; - struct cache_list_node *prev; -}; -static struct cache_list_node *cache_head; -static struct cache_list_node *cache_tail; -static struct cache_list_node *lostfound_head; -static int cache_count; +/** + * add a pst descriptor node to a linked list of such nodes. + * + * @param node pointer to the node to be added to the list + * @param head pointer to the list head pointer + * @param tail pointer to the list tail pointer + */ +static void add_descriptor_to_list(pst_desc_ll *node, pst_desc_ll **head, pst_desc_ll **tail); +static void add_descriptor_to_list(pst_desc_ll *node, pst_desc_ll **head, pst_desc_ll **tail) +{ + DEBUG_ENT("add_descriptor_to_list"); + //DEBUG_INDEX(("Added node %#"PRIx64" parent %#"PRIx64" real parent %#"PRIx64" prev %#"PRIx64" next %#"PRIx64"\n", + // node->id, node->parent_id, + // (node->parent ? node->parent->id : (uint64_t)0), + // (node->prev ? node->prev->id : (uint64_t)0), + // (node->next ? node->next->id : (uint64_t)0))); + if (*tail) (*tail)->next = node; + if (!(*head)) *head = node; + node->prev = *tail; + *tail = node; + DEBUG_RET(); +} /** - add the d_ptr descriptor into the global tree -*/ -static void record_descriptor(pst_file *pf, pst_desc_ll *d_ptr, uint64_t parent_id); -static void record_descriptor(pst_file *pf, pst_desc_ll *d_ptr, uint64_t parent_id) { - struct cache_list_node *lostfound_ptr = NULL; - struct cache_list_node *cache_ptr = NULL; - pst_desc_ll *parent = NULL; - - if (parent_id == 0 || parent_id == d_ptr->id) { - // add top level node to the descriptor tree - if (parent_id == 0) { - DEBUG_INDEX(("No Parent\n")); - } else { - DEBUG_INDEX(("Record is its own parent. What is this world coming to?\n")); - } - if (pf->d_tail) pf->d_tail->next = d_ptr; - if (!pf->d_head) pf->d_head = d_ptr; - d_ptr->prev = pf->d_tail; - pf->d_tail = d_ptr; - } else { - DEBUG_INDEX(("Searching for parent\n")); - // check in the cache for the parent - cache_ptr = cache_head; - while (cache_ptr && (cache_ptr->ptr->id != parent_id)) { - cache_ptr = cache_ptr->next; + * add a pst descriptor node into the global tree. + * + * @param pf global pst file pointer + * @param node pointer to the node to be added to the tree + */ +static void record_descriptor(pst_file *pf, pst_desc_ll *node); +static void record_descriptor(pst_file *pf, pst_desc_ll *node) +{ + // find any orphan children of this node, and collect them + pst_desc_ll *n = pf->d_head; + while (n) { + if (n->parent_id == node->id) { + // found a child of this node + DEBUG_INDEX(("Found orphan child %#"PRIx64" of parent %#"PRIx64"\n", n->id, node->id)); + pst_desc_ll *nn = n->next; + pst_desc_ll *pp = n->prev; + node->no_child++; + n->parent = node; + n->prev = NULL; + n->next = NULL; + add_descriptor_to_list(n, &node->child, &node->child_tail); + if (pp) pp->next = nn; else pf->d_head = nn; + if (nn) nn->prev = pp; else pf->d_tail = pp; + n = nn; } - if (!cache_ptr && (parent = pst_getDptr(pf, parent_id)) == NULL) { - // check in the lost/found list - lostfound_ptr = lostfound_head; - while (lostfound_ptr && (lostfound_ptr->ptr->id != parent_id)) { - lostfound_ptr = lostfound_ptr->next; - } - if (!lostfound_ptr) { - DEBUG_WARN(("ERROR -- cannot find parent with id %#"PRIx64". Adding id %#"PRIx64" to lost/found\n", parent_id, d_ptr->id)); - lostfound_ptr = (struct cache_list_node*) xmalloc(sizeof(struct cache_list_node)); - lostfound_ptr->prev = NULL; - lostfound_ptr->next = lostfound_head; - lostfound_ptr->parent = parent_id; - lostfound_ptr->ptr = d_ptr; - lostfound_head = lostfound_ptr; - } else { - parent = lostfound_ptr->ptr; - DEBUG_INDEX(("Found parent (%#"PRIx64") in lost/found\n", parent->id)); - } + else { + n = n->next; } - - if (cache_ptr || parent) { - if (cache_ptr) - // parent is already in the cache - parent = cache_ptr->ptr; - else { - //add the parent to the cache - DEBUG_INDEX(("Cache addition\n")); - cache_ptr = (struct cache_list_node*) xmalloc(sizeof(struct cache_list_node)); - cache_ptr->prev = NULL; - cache_ptr->next = cache_head; - cache_ptr->ptr = parent; - if (cache_head) cache_head->prev = cache_ptr; - if (!cache_tail) cache_tail = cache_ptr; - cache_head = cache_ptr; - cache_count++; - if (cache_count > 100) { - DEBUG_INDEX(("trimming quick cache\n")); - //remove one from the end - cache_ptr = cache_tail; - cache_tail = cache_ptr->prev; - if (cache_tail) cache_tail->next = NULL; - free (cache_ptr); - cache_count--; - } - } - DEBUG_INDEX(("Found a parent\n")); + } + + // now hook this node into the global tree + if (node->parent_id == 0) { + // add top level node to the descriptor tree + //DEBUG_INDEX(("Null parent\n")); + add_descriptor_to_list(node, &pf->d_head, &pf->d_tail); + } + else if (node->parent_id == node->id) { + // add top level node to the descriptor tree + DEBUG_INDEX(("%#"PRIx64" is its own parent. What is this world coming to?\n")); + add_descriptor_to_list(node, &pf->d_head, &pf->d_tail); + } else { + //DEBUG_INDEX(("Searching for parent %#"PRIx64" of %#"PRIx64"\n", node->parent_id, node->id)); + pst_desc_ll *parent = pst_getDptr(pf, node->parent_id); + if (parent) { + //DEBUG_INDEX(("Found parent %#"PRIx64"\n", node->parent_id)); parent->no_child++; - d_ptr->parent = parent; - if (parent->child_tail) parent->child_tail->next = d_ptr; - if (!parent->child) parent->child = d_ptr; - d_ptr->prev = parent->child_tail; - parent->child_tail = d_ptr; + node->parent = parent; + add_descriptor_to_list(node, &parent->child, &parent->child_tail); + } + else { + //DEBUG_INDEX(("No parent %#"PRIx64", have an orphan child %#"PRIx64"\n", node->parent_id, node->id)); + add_descriptor_to_list(node, &pf->d_head, &pf->d_tail); } } } + int pst_build_desc_ptr (pst_file *pf, off_t offset, int32_t depth, uint64_t linku1, uint64_t *high_id, uint64_t start_val, uint64_t end_val) { struct pst_table_ptr_structn table, table2; pst_descn desc_rec; @@ -930,18 +910,6 @@ int32_t x, item_count; uint64_t old = start_val; char *buf = NULL, *bptr; - struct cache_list_node *cache_ptr = NULL; - struct cache_list_node *lostfound_ptr = NULL; - struct cache_list_node *lostfound_shd = NULL; - struct cache_list_node *lostfound_tmp = NULL; - - if (depth == 0) { - // initialize the linked list and lost/found list. - cache_head = NULL; - cache_tail = NULL; - lostfound_head = NULL; - cache_count = 0; - } DEBUG_ENT("pst_build_desc_ptr"); DEBUG_INDEX(("offset %#"PRIx64" depth %i linku1 %#"PRIx64" start %#"PRIx64" end %#"PRIx64"\n", offset, depth, linku1, start_val, end_val)); @@ -1000,8 +968,13 @@ } } // When duplicates found, just update the info.... perhaps this is correct functionality - DEBUG_INDEX(("Searching for existing record\n")); + DEBUG_INDEX(("Searching for existing record %#"PRIx64"\n", desc_rec.d_id)); if (desc_rec.d_id <= *high_id && (d_ptr = pst_getDptr(pf, desc_rec.d_id))) { + // This is probably unreachable code, originally written when the + // tree walking code was broken since it did not know about the node + // count. It always processed all the items in the node, even unused + // items, and that probably made it look like there were duplicate + // entries. uint64_t bigzero = 0; DEBUG_INDEX(("Updating Existing Values\n")); d_ptr->list_index = pst_getID(pf, desc_rec.list_id); @@ -1033,19 +1006,21 @@ else pf->d_tail = d_ptr->prev; - d_ptr->prev = NULL; - d_ptr->next = NULL; - d_ptr->parent = NULL; - record_descriptor(pf, d_ptr, desc_rec.parent_id); // add to the global tree + d_ptr->parent_id = desc_rec.parent_id; + d_ptr->prev = NULL; + d_ptr->next = NULL; + d_ptr->parent = NULL; + record_descriptor(pf, d_ptr); // add to the global tree } } else { if (*high_id < desc_rec.d_id) { DEBUG_INDEX(("Updating New High\n")); *high_id = desc_rec.d_id; } - DEBUG_INDEX(("New Record\n")); + DEBUG_INDEX(("New Record %#"PRIx64" with parent %#x\n", desc_rec.d_id, desc_rec.parent_id)); d_ptr = (pst_desc_ll*) xmalloc(sizeof(pst_desc_ll)); d_ptr->id = desc_rec.d_id; + d_ptr->parent_id = desc_rec.parent_id; d_ptr->list_index = pst_getID(pf, desc_rec.list_id); d_ptr->desc = pst_getID(pf, desc_rec.desc_id); d_ptr->prev = NULL; @@ -1054,32 +1029,9 @@ d_ptr->child = NULL; d_ptr->child_tail = NULL; d_ptr->no_child = 0; - record_descriptor(pf, d_ptr, desc_rec.parent_id); // add to the global tree - - } - // check here to see if d_ptr is the parent of any of the items in the lost / found list - lostfound_ptr = lostfound_head; - lostfound_shd = NULL; - while (lostfound_ptr) { - if (lostfound_ptr->parent == d_ptr->id) { - DEBUG_INDEX(("Found a lost/found child (%#"PRIx64") of the current record. Joining to main structure.\n", lostfound_ptr->ptr->id)); - parent = d_ptr; - d_ptr = lostfound_ptr->ptr; - parent->no_child++; - d_ptr->parent = parent; - if (parent->child_tail) parent->child_tail->next = d_ptr; - if (!parent->child) parent->child = d_ptr; - d_ptr->prev = parent->child_tail; - parent->child_tail = d_ptr; - if (!lostfound_shd) lostfound_head = lostfound_ptr->next; - else lostfound_shd->next = lostfound_ptr->next; - lostfound_tmp = lostfound_ptr->next; - free(lostfound_ptr); - lostfound_ptr = lostfound_tmp; - } else { - lostfound_shd = lostfound_ptr; - lostfound_ptr = lostfound_ptr->next; - } + record_descriptor(pf, d_ptr); // add to the global tree + //DEBUG_INDEX(("dump parent descriptor tree\n")); //!! + //d_ptr = pst_getDptr(pf, (uint64_t)-1); //!! } } } else { @@ -1122,21 +1074,6 @@ (void)pst_build_desc_ptr(pf, table.offset, depth+1, table.u1, high_id, table.start, table2.start); } } - if (depth == 0) { - // free the quick cache - while (cache_head) { - cache_ptr = cache_head->next; - free(cache_head); - cache_head = cache_ptr; - } - // free the lost and found - while (lostfound_head) { - lostfound_ptr = lostfound_head->next; - WARN(("unused lost/found item %#"PRIx64" with parent %#"PRIx64, lostfound_head->parent, lostfound_head->ptr->id)); - free(lostfound_head); - lostfound_head = lostfound_ptr; - } - } if (buf) free(buf); DEBUG_RET(); return 0; @@ -4137,6 +4074,7 @@ pst_desc_ll *ptr = pf->d_head; DEBUG_ENT("pst_getDptr"); while (ptr && (ptr->id != id)) { + //DEBUG_INDEX(("Looking for %#"PRIx64" at node %#"PRIx64" with parent %#"PRIx64"\n", id, ptr->id, ptr->parent_id)); if (ptr->child) { ptr = ptr->child; continue; diff -r 1e4a7610d525 -r 1fc33da23175 src/libpst.h --- a/src/libpst.h Thu Oct 02 15:29:36 2008 -0700 +++ b/src/libpst.h Sun Oct 05 09:56:32 2008 -0700 @@ -188,6 +188,7 @@ typedef struct pst_desc_tree { uint64_t id; + uint64_t parent_id; pst_index_ll * list_index; pst_index_ll * desc; int32_t no_child; diff -r 1e4a7610d525 -r 1fc33da23175 xml/libpst.in --- a/xml/libpst.in Thu Oct 02 15:29:36 2008 -0700 +++ b/xml/libpst.in Sun Oct 05 09:56:32 2008 -0700 @@ -33,7 +33,7 @@ - 2008-09-28 + 2008-10-05 @@ -233,7 +233,7 @@ - 2008-09-28 + 2008-10-05 @@ -336,7 +336,7 @@ - 2008-09-28 + 2008-10-05 @@ -481,6 +481,15 @@ + + TODO + + The binary debug log file is generally much larger than the .pst file, + so we need to switch to ftello/fseeko there also to handle files + larger than 2GB. + + + Copyright @@ -511,7 +520,7 @@ - 2008-09-28 + 2008-10-05 @@ -642,7 +651,7 @@ - 2008-09-28 + 2008-10-05 @@ -776,7 +785,7 @@ - 2008-09-28 + 2008-10-05