C 哈希表大小问题

标签 c hash

首先我要说的是,我了解非常基本的 C 语言。我已经使用 tutorialspoint 中的代码实现了一个哈希表。网站。

我的程序正在从文本文件中读取正整数,将数字及其在文件中出现的次数分别作为键和值。实际上它工作得很好,但我面临一个大问题:我需要事先知道哈希表的大小。我注意到,当我跳过负数时,它会奇怪地弄乱表格的结构。

这是我的代码(如果使用大小为 20):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>

#define SIZE 20

struct DataItem {
  int key;
  int data;
};

struct DataItem* hashArray[SIZE];
struct DataItem* item;

int hashCode(int key) {
  return key % SIZE;
}

struct DataItem *search(int key) {
  int hashIndex = hashCode(key);

  while(hashArray[hashIndex] != NULL) {

    if(hashArray[hashIndex]->key == key)
    return hashArray[hashIndex];

    ++hashIndex;
    hashIndex %= SIZE;
  }

  return NULL;
}

void insert(int key,int data) {
  struct DataItem *item = (struct DataItem*) malloc(sizeof(struct DataItem));
  item->data = data;
  item->key = key;
  int hashIndex = hashCode(key);

  while(hashArray[hashIndex] != NULL && hashArray[hashIndex]->key != -1) {
    ++hashIndex;
    hashIndex %= SIZE;
  }

  hashArray[hashIndex] = item;
}

void display() {
  int key_index;
  for(key_index = 0; key_index<SIZE; key_index++) {
    if(hashArray[key_index] != NULL) {
      printf("%d has occurred %d time(s)\n", hashArray[key_index]->key, hashArray[key_index]->data);
    }
  }
  printf("\n");
}

void sort(size_of_struct) {
  int i, tempkey, tempdata;

  if(size_of_struct == 1) {
    return;
  }

  for(i=0; i<(size_of_struct-1); i++) {
    if(hashArray[i] != NULL) {
      if(hashArray[i]->key > hashArray[i+1]->key) {
        tempkey = hashArray[i]->key;
        hashArray[i]->key = hashArray[i+1]->key;
        hashArray[i+1]->key = tempkey;

        tempdata = hashArray[i]->data;
        hashArray[i]->data = hashArray[i+1]->data;
        hashArray[i+1]->data = tempdata;
      }
    }
  }
  sort(size_of_struct-1);
}

int main() {
  FILE *myFile;
  myFile = fopen("test.txt", "r");

  int *numberArray = NULL;
  int i, count, number, current_num, temp;
  count = number = temp = 0;

  numberArray = malloc(sizeof(int)*500);

  while(fscanf(myFile, "%d", &number) > 0) {
    if(number >= 0 && number <= 100) {
        numberArray[count] = number;
        count++;
    }
  }

  for(i=0; i < count; i++)
  {
    item = search(numberArray[i]);
    if(item == NULL) {
      insert(numberArray[i], 1);
    } else {
      item = search(numberArray[i]);
      item->data += 1;
    }
    free(numberArray);
    numberArray = malloc(sizeof(int)*500);
  }

  display();
  sort(SIZE);
}

如果我运行此代码,我会得到以下结果:

80 has occurred 1 time(s)
100 has occurred 1 time(s)
2 has occurred 1 time(s)
3 has occurred 2 time(s)
84 has occurred 1 time(s)
12 has occurred 1 time(s)
33 has occurred 1 time(s)
99 has occurred 3 time(s)

Segmentation fault: 11

如果我使用的 SIZE 为 8(这是我从文件中使用的数字总数),那么我会得到以下结果:

80 has occurred 1 time(s)
33 has occurred 1 time(s)
2 has occurred 1 time(s)
99 has occurred 3 time(s)
3 has occurred 2 time(s)
12 has occurred 1 time(s)
100 has occurred 1 time(s)
84 has occurred 1 time(s)

2 has occurred 1 time(s)
3 has occurred 2 time(s)
12 has occurred 1 time(s)
33 has occurred 1 time(s)
80 has occurred 1 time(s)
84 has occurred 1 time(s)
99 has occurred 3 time(s)
100 has occurred 1 time(s)

经过进一步调查,我注意到我的显示功能中发生了这种情况:

key_index: 0
80 has occurred 1 time(s)
key_index: 1
100 has occurred 1 time(s)
key_index: 2
2 has occurred 1 time(s)
key_index: 3
3 has occurred 2 time(s)
key_index: 4
84 has occurred 1 time(s)
key_index: 5
key_index: 6
key_index: 7
key_index: 8
key_index: 9
key_index: 10
key_index: 11
key_index: 12
12 has occurred 1 time(s)
key_index: 13
33 has occurred 1 time(s)
key_index: 14
key_index: 15
key_index: 16
key_index: 17
key_index: 18
key_index: 19
99 has occurred 3 time(s)

Segmentation fault: 11

我不完全确定发生了什么。我知道为什么会发生段错误:它试图引用该索引处为空的值。我不知道的是为什么(在显示函数内部)它会跳过 5-11 和 14-18 中的 key_index 值;这是我真正关心的问题。我假设因为我没有使用文件中的所有数字,所以该结构正在尝试填充它,以便始终有 20 个空格。我错了吗?有什么办法可以解决这个问题吗?

我的 test.txt 文件:

99 2 99
3
80 12 -12 33
3 99 100 1234 84

最佳答案

嘿,看起来你那里有一个段错误。该死,追踪这些可能非常困难。如@underscore_d提到,您可以通过调试器运行程序以在代码运行时检查代码。

但是,幸运的是,您提供了(大部分)Minimal, Complete and Verifiable例子给我们看一下。我提供了自己的 test.txt 文件,但能够重现崩溃。现在,这可能不是您看到的崩溃,因为我没有您的 test.txt,但希望以下分析无论如何都能帮助您。

由于您的示例很短,您还可以考虑通过一些分析工具来运行您的代码。我定期通过 AddressSanitizer 运行我的所有代码和 Valgrind帮助避免引入此类错误。在您的情况下,AddressSanitizer 会查明错误的位置。

$ gcc -fsanitize=address -g test.c -o test
$ ./test
80 has occurred 1 time(s)

ASAN:DEADLYSIGNAL
=================================================================
==18300==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x561f6050b39f bp 0x7ffe4d6866b0 sp 0x7ffe4d686690 T0)
    #0 0x561f6050b39e in sort /home/v1bri/test.c:67
    #1 0x561f6050bae2 in main /home/v1bri/test.c:111
    #2 0x7f60938ee3f0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x203f0)
    #3 0x561f6050acb9 in _start (/home/v1bri/test+0xcb9)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/v1bri/test.c:67 in sort
==18300==ABORTING

错误发生在 sort() 函数的第 67 行。果然...

void sort(size_of_struct) {
// ...

  for(i=0; i<(size_of_struct-1); i++) {
    if(hashArray[i] != NULL) {
      if(hashArray[i]->key > hashArray[i+1]->key) {

在取消引用 hashArray[i+1] 处的指针时,您忘记检查 NULL。 Valgrind 会给你更多的错误,但也会找到罪魁祸首。

$ gcc -g test.c -o test
$ valgrind --tool=memcheck ./test
==18764== Memcheck, a memory error detector
==18764== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==18764== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info

<snip>

==18764== Invalid read of size 4
==18764==    at 0x108B0D: sort (test.c:67)
==18764==    by 0x108DA7: main (test.c:111)
==18764==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==18764==
==18764==
==18764== Process terminating with default action of signal 11 (SIGSEGV)
==18764==  Access not within mapped region at address 0x0
==18764==    at 0x108B0D: sort (test.c:67)
==18764==    by 0x108DA7: main (test.c:111)

我知道您提到过您了解“非常基础的 C 语言”,但希望您能看到使用现代调试工具来测试程序错误是多么容易。对于新程序员来说,学习工具可能会让人望而生畏,但这实际上是学习的最佳时机,因为它将为您节省大量时间

关于C 哈希表大小问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47579219/

相关文章:

c - 如何解决电子邮件的 sscanf 问题?

c - 将 unsigned char 解析为 int 指针后出现段错误

c++ - boost::Program_options 如何在值中支持散列字符?

c++ - 将 std::string 散列为 std::size_t 以外的东西

ruby hash 检查值是否存在

python - 按对象或两个 float 索引 python dict

c - 为什么这些构造使用增量前和增量后未定义的行为?

c - "free functions"在C通用数据结构实现中的作用

java - MessageDigest.update(byte[]) 是做什么的?

c - 如何在 Linux 上实现字符串命令输出以及执行同一二进制文件