c++ - 如何二进制搜索结构 vector 并插入适当的索引

标签 c++ vector binary-search insert-update

我想做的:

我想要一个排序的文本表(大小相等),它显示每个文本的出现次数。当文本被插入到表中时,表被排序。当单词被插入到表中时,会检查它们是否已经存在,在这种情况下它们的 ref_count 会增加。如果它们是新的,则它们的 ref_count 设置为 1 并插入到 vector 中正确索引处,以确保表仍然排序。

我做了什么:

我创建了一个结构体和一个结构体 vector ,定义如下。然后,我使用所示的二进制搜索来确定适当的索引以使用 std::insert() 函数。

我的问题:我的二进制搜索实现没有返回正确的索引位置。

#define WORD_LENGTH 6

typedef struct RC_table {
    char word[WORD_LENGTH+1];//+1 for ‘\0’
    unsigned int ref_count;
}RC_table;

std::vector<RC_table>RC;

void update_RC(char *word_to_insert)
{
    int index = 0; 
    bool found=binary_search_RC(word_to_insert, &index);
    if (found == TRUE) {
        //increment reference count
        RC[index].ref_count++;
    }
    else {
        //insert new word at index 
        RC_table new_entry;
        memcpy(new_entry.word, word_to_insert, WORD_LENGTH);
        new_entry.word[WORD_LENGTH] = '\0';
        new_entry.ref_count = 1;

        if(index==0)
            RC.insert(RC.begin(),new_entry);
        else if(index==RC.size()-1)
            RC.insert(RC.end(),new_entry);
        else {
            RC.insert(RC.begin() + index +1, new_entry);
        }
    }
}
bool binary_search_RC(char *word, int *index) {
    int left = 0;
    int right = RC.size() -1;
    int middle = (left + right) / 2;
    bool found = false;
    while (left<=right) {
        middle = (left + right) / 2;
        *index = middle;
        if (strncmp(word, RC[middle].word, WORD_LENGTH) < 0) {
            right = middle - 1;
        }

        else if(strncmp(word, RC[middle].word, WORD_LENGTH) > 0) {
            left = middle + 1;
        }
        else {
            found = true;
            break;
        }
    }
    *index = middle;
    return found;
}

编辑: 我尝试使用 lower_bound()。它仍然没有给出预期的输出(即排序表)。

typedef struct RC_table {
    char word[WORD_LENGTH+1];//+1 for ‘\0’
    unsigned int ref_count;
    bool operator<(const RC_table&r){
         return word<r.word;
    }
}RC_table;

插入表格使用:

auto itr=lower_bound(RC.begin(),RC.end(),new_enry);
RC.insert(itr,new_entry);

最佳答案

您的算法的最大问题是您试图使用您认为可搜索的元素来围栏分区的两边。你不是那样做的;在最坏的找不到场景的情况下,分区最终会(并且将会)陷入重复的 1/2 整数除法。当您改为这样做时,数学计算起来会容易得多:

  • 分区的左端指的是第一个元素,被认为是可疑的。
  • 分区的右端是指过去的元素。并且不被视为可疑。

结果是一个更简单的算法,更容易理解,也更容易维护。

bool binary_search_RC(const char *word, int *index)
{
    *index = 0;

    int left = 0;
    int right = static_cast<int>(RC.size());
    bool found = false;
    while (!found && left < right)
    {
        int middle = *index = left + (right-left) / 2;
        int res = strncmp(word, RC[middle].word, WORD_LENGTH);

        if (res < 0)
            right = middle;

        else if (res > 0)
            *index = left = middle+1;

        else
            found = true;
    }
    return found;
}

付诸实践,一个简单的小测试工具,它可以从一个简单的三字母字母表中生成随机的三字符字符串。那应该会出现大量独特的插入,并最终出现大量发现。最后,我们将打印整个表格,如果可行的话,最好对其进行排序。

代码

#include <iostream>
#include <vector>
#include <random>

#define WORD_LENGTH 6

typedef struct RC_table {
    char word[WORD_LENGTH+1];//+1 for ‘\0’
    unsigned int ref_count;
} RC_table;

std::vector<RC_table>RC;

bool binary_search_RC(const char *word, int *index)
{
    *index = 0;

    int left = 0;
    int right = static_cast<int>(RC.size());
    bool found = false;
    while (!found && left < right)
    {
        int middle = *index = left + (right-left) / 2;
        int res = strncmp(word, RC[middle].word, WORD_LENGTH);

        if (res < 0)
            right = middle;

        else if (res > 0)
            *index = left = middle+1;

        else
            found = true;
    }
    return found;
}

void update_RC(const char *word_to_insert)
{
    int index = 0;
    bool found = binary_search_RC(word_to_insert, &index);

    if (found)
    {
        ++RC[index].ref_count;
        std::cout << "Found entry for " << word_to_insert;
        std::cout << " (" << RC[index].ref_count << ")\n";
    }
    else {
        std::cout << "Adding entry for " << word_to_insert << '\n';

        //insert new word at index
        RC_table new_entry;
        strncpy(new_entry.word, word_to_insert, WORD_LENGTH);
        new_entry.word[WORD_LENGTH] = 0;
        new_entry.ref_count = 1;

        if(index==0)
            RC.insert(RC.begin(),new_entry);

        else if(index == RC.size())
            RC.insert(RC.end(),new_entry);

        else
            RC.insert(RC.begin() + index, new_entry);
    }
}



int main()
{
    // generate some random values and start adding them. a few dozen
    //  with a severely limited alphabet should suffice.
    const char alphabet[] = "abc";
    std::mt19937 rng{ std::random_device{}() };
    std::uniform_int_distribution<std::size_t> dist(0, sizeof alphabet - 2);

    for (int i=0; i<50; ++i)
    {
        char word[WORD_LENGTH+1] = {};
        for (int j=0; j<3; ++j)
            word[j] = alphabet[ dist(rng) ];
        update_RC(word);
    }

    // print the table
    for (auto const& x : RC)
        std::cout << x.word << " : " << x.ref_count << '\n';
}

输出(显然不同)

Adding entry for cab
Adding entry for cac
Adding entry for bcc
Adding entry for bbb
Adding entry for cbc
Adding entry for abb
Found entry for cab (2)
Adding entry for aba
Adding entry for cca
Adding entry for acc
Found entry for aba (2)
Found entry for bcc (2)
Adding entry for cbb
Found entry for cac (2)
Found entry for cac (3)
Adding entry for aaa
Found entry for acc (2)
Adding entry for bbc
Adding entry for baa
Adding entry for acb
Found entry for aaa (2)
Found entry for cca (2)
Found entry for baa (2)
Found entry for cbb (2)
Adding entry for aac
Found entry for cac (4)
Adding entry for aca
Adding entry for ccc
Found entry for bbc (2)
Adding entry for bba
Adding entry for bac
Adding entry for aab
Found entry for bac (2)
Found entry for aca (2)
Found entry for bcc (3)
Adding entry for caa
Found entry for aaa (3)
Found entry for bbc (3)
Found entry for caa (2)
Found entry for abb (2)
Found entry for baa (3)
Found entry for acc (3)
Found entry for bba (2)
Found entry for bbb (2)
Found entry for cbc (2)
Found entry for aaa (4)
Found entry for baa (4)
Adding entry for cba
Found entry for bac (3)
Found entry for bbc (4)
aaa : 4
aab : 1
aac : 1
aba : 2
abb : 2
aca : 2
acb : 1
acc : 3
baa : 4
bac : 3
bba : 2
bbb : 2
bbc : 4
bcc : 3
caa : 2
cab : 2
cac : 4
cba : 1
cbb : 2
cbc : 2
cca : 2
ccc : 1

我没有费心去计算,但将这些引用计数相加,您应该会发现它们的总和为 50,即我们执行的插入次数。

祝你好运。

关于c++ - 如何二进制搜索结构 vector 并插入适当的索引,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58887992/

相关文章:

C++ TlHelp32.h 不工作?

erlang - 在 Erlang 中实现高效的二分查找

python - Python中的二分查找,更优雅的方法?

C++: error LNK2019: 未解析的外部符号

c++ - std::allocator_traits::construct with const 指针

c++ - vector 和队列 C++

vector - Clojure 中的日期周期

c++ - 是否有适合度假的标准容器?

java - 对象数组的二分查找

c++ - Qt中不同线程中的对象同步