Doubly Linked List, MergeSort, getting undefined and unreliable results

234 views Asked by At

I try my best to look up solutions to problems before asking on here, but I've hit a bit of a brick wall. While there is one for Java, it didn't help me enough to understand where I went wrong with my implementation. So without further ado, here's some background, and then the problem.

Background/Reason for attempting this

To keep things short, the reason I'm making this isn't just for learning purposes, and no this not a homework assignment as my Data Structures and Algorithms class is a year from now, it's for practical general use, without worrying too much on optimization. So, hence if it seems a bit inefficient, please feel free to point it out and let me know how I can improve it, but 100% optimization isn't my true goal. I've also created a simple logging tool to help me keep track of all the elements in the linked list to help debug the algorithm, and even though it does what I intended, I still can't figure out how to fix it.

Problem

It seems that some runs work perfectly, I can see that it correctly divides and conquers the list, yet other times, at random places I see repeating places. For instance, here is a good run from the Linked List. Then if I run it again, I either get similar good results or bad ones like this

As can be seen from the bad, I seem to have messed up somewhere, and one of the nodes is pointing to a node that, directly or indirectly, points to itself.

What I've Tried.

I've attempted to create a smaller data structure to keep track of the head and tail nodes for each split list, aptly sub_list_t.

typedef struct {
    Node *head;
    Node *tail;
    size_t size;
} sub_list_t;

The reasoning behind this is because, I figured it would make it much easier to interact with, and I even have it's own individual separate functions; also the original Linked List is rather large to be needlessly creating 3 per sort (List_One, List_Two, and Final_List). The original list contains rwlocks to attempt to make them concurrent, so the overhead of creating 3n short-lived locks wouldn't make much sense. Hence, a mini/sub list is easier and better, IMO.

The function that splits and merges each sub_list into a final list, sort_lists, is implemented here.

static sub_list_t *sort_list(sub_list_t *list, Linked_List_Compare compare){
    if(list->size == 1) {
        return list;
    }
    size_t mid = list->size / 2;
    sub_list_t *list_one = sub_list_of(list, 0, mid-1);
    list_one = sort_list(list_one, compare);
    sub_list_t *list_two = sub_list_of(list, mid, list->size - 1);
    list_two = sort_list(list_two, compare);
    sub_list_t *final_list = merge_lists(list_one, list_two, compare);
    free(list_one);
    free(list_two);
    return final_list;
}

The function that splits each sub_list down into more sub_lists, is implemented here...

static sub_list_t *sub_list_of(sub_list_t *list, unsigned int begin, unsigned int end){
    sub_list_t *sub_list = malloc(sizeof(sub_list_t));
    int i = 0;
    size_t size = 0;
    Node *node = list->head;
    while(i < begin) {
        node = node->next;
        i++;
    }
    sub_list->head = node;
    while(i++ <= end){
        node = node->next;
        size++;
    }
    sub_list->tail = node;
    sub_list->size = size;
    return sub_list;
}

I've a feeling the problem is above. I had to change it multiple times, since I'm unsure. It should accurately get the head, but the tail, I'm not certain, although it looks good enough.

static void append_to_list(sub_list_t *list, Node *node){
    if(!list->size){ // If list->size == 0
        list->tail = list->head = node;
        list->size++;
        return;
    }
    // The problem must be here then? This is the only part of code that modifies what the nodes point to.
    list->tail->next = node;
    node->prev = list->tail;
    list->tail = node;
    list->size++;
}

Simple constructor for sub_list is here, only used at beginning when Linked_List creates the sub_list or to create an empty final_list in merge_list, implemented below.

static sub_list_t *sub_list_create(Node *head, Node *tail, size_t size){
    sub_list_t *list = malloc(sizeof(sub_list_t));
    list->head = head;
    list->tail = tail;
    list->size = size;
    return list;
}

Function to append the rest of a list to the final list, which also could be culprit, although I can't find the fault in it.

static void append_list_to_list(sub_list_t *list_src, sub_list_t *list_dst){
    Node *node = NULL;
    int i = 0;
    size_t old_size = list_dst->size;
    for(node = list_src->head; i++ < list_src->size; node = node->next) append_to_list(list_dst, node);
}

Finally, the merge_list, which handles merging and comparisons, which seems to be where the problems originates, as it makes all calls to append_to_list and append_list_to_list...

static sub_list_t *merge_lists(sub_list_t *list_one, sub_list_t *list_two, Linked_List_Compare compare){
    sub_list_t *final_list = sub_list_create(NULL, NULL, 0);
    while(list_one->size > 0 && list_two->size > 0){
        if(compare(list_one->head->item, list_two->head->item) <= 0){
            append_to_list(final_list, list_one->head);
            list_one->head = list_one->head->next;
            list_one->size--;
        } else {
            append_to_list(final_list, list_two->head);
            list_two->head = list_two->head->next;
            list_two->size--;
        }
    }
    if(list_one->size > 0) {
        append_list_to_list(list_one, final_list);
    }
    if(list_two->size > 0) {
        append_list_to_list(list_two, final_list);
    }
    return final_list;
}

If I'm doing something wrong, let me know. Also, if this is considered a bad way to ask a question, also please let me know. I'll work on a runnable example in ideone right away.

Edit: Here. Mostly replicated results, infinite looping, nodes pointing indirectly to itself, etc.

Edit2: I managed to implement an insertion sort rather easily, by just swapping out the data held by the nodes rather than where the nodes point to... within 2 minutes... yet I've been trying to do the merge sort for about 2 days to no avail. Yet I still don't know how it precisely even worked.

1

There are 1 answers

3
rcgldr On BEST ANSWER

A bottom up merge sort is more appropriate for sorting a linked list. An efficient method for doing this is to use an array of pointers to the first nodes of lists, where the number of nodes pointed to by array[i] is 2^i (2 to the power i), or array[i] is an empty list.

The double linked list can be treated as a single linked list during the merge sort and then the previous links fixed once the list is sorted.

Nodes are removed from the original list one at a time and merged into the array, then the lists in the array are merged to form a single sorted list. Since it seems you need this code soon, here is example C code:

#define NUMLISTS 32                     /* number of lists */

typedef struct NODE_{
struct NODE_ * next;
int data;
}NODE;

NODE * MergeLists(NODE *, NODE *);

NODE * SortList(NODE *pList)
{
NODE * aList[NUMLISTS];                 /* array of lists */
NODE * pNode;
NODE * pNext;
int i;
    if(pList == NULL)                   /* check for empty list */
        return NULL;
    for(i = 0; i < NUMLISTS; i++)       /* zero array */
        aList[i] = NULL;
    pNode = pList;                      /* merge nodes into aList[] */
    while(pNode != NULL){
        pNext = pNode->next;
        pNode->next = NULL;
        for(i = 0; (i < NUMLISTS) && (aList[i] != NULL); i++){
            pNode = MergeLists(aList[i], pNode);
            aList[i] = NULL;
        }
        if(i == NUMLISTS)
            i--;
        aList[i] = pNode;
        pNode = pNext;
    }
    pNode = NULL;                       /* merge array into one list */
    for(i = 0; i < NUMLISTS; i++)
        pNode = MergeLists(aList[i], pNode);
    return pNode;
}

NODE * MergeLists(NODE *pSrc1, NODE *pSrc2)
{
NODE *pDst = NULL;                      /* destination head ptr */
NODE **ppDst = &pDst;                   /* ptr to head or prev->next */
    while(1){
        if(pSrc1 == NULL){
            *ppDst = pSrc2;
            break;
        }
        if(pSrc2 == NULL){
            *ppDst = pSrc1;
            break;
        }
        if(pSrc2->data < pSrc1->data){  /* if src2 < src1 */
            *ppDst = pSrc2;
            pSrc2 = *(ppDst = &(pSrc2->next));
            continue;
        } else {                        /* src1 <= src2 */
            *ppDst = pSrc1;
            pSrc1 = *(ppDst = &(pSrc1->next));
            continue;
        }
    }
    return pDst;
}