我有几个关于我需要做的作业的问题。看起来我正在寻找的是获取代码,但是,我试图做的是学习,因为在搜索信息数周后我丢失了。我m really new at
C`。
这是作业:
foo.txt
、 bar.txt
、 foo2.txt
),它们都有不同数量的单词(我需要使用动态内存)。 创建一个程序,要求输入一个词并告诉您该词是否在任何文档中(结果是出现该词的文档的名称)。
例子 :
(我想我需要加载 3 个文件,创建一个哈希表,其中包含文档中每个单词的键值,但还有一些东西可以告诉您单词所在的文档是哪个文档)。
我想我需要实现:
Hash Function
将单词转换为 HashValue
Hash Table
存储 HashValue
每个单词(但我认为我还应该存储文档索引?)。 dynamic allocation
. collisions
同时我将值插入到哈希表中(使用 Quadratic Probing
和 Chaining
)。 我一直在搜索哈希图实现、哈希表、二次探测、字符串哈希函数……但我现在脑子一片困惑,我现在不知道应该从哪里开始。
到目前为止我读过:
Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?
Implementing with quadratic probing
Does C have hash/dictionary data structure?
https://gist.github.com/tonious/1377667
hash function for string
http://www.cs.yale.edu/homes/aspnes/pinewiki/C(2f)HashTables.html?highlight=(CategoryAlgorithmNotes)
https://codereview.stackexchange.com/questions/115843/dictionary-implementation-using-hash-table-in-c
对不起,我的英语提前。
希望您能够帮助我。
谢谢。
第一次编辑
TRIE
数据结构,我正在检查你给我的链接。 思考了一个小时后,我尝试了以下方法:
typedef struct WordMetadata {
char* Word;
int Documents[5];
int DocumentsCount;
} WordMetadata;
void InitTable (WordMetadata **Table) {
Table = (WordMetadata**) malloc (sizeof(WordMetadata) * TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
Table[i] = (WordMetadata*) NULL;
}
}
djb2
或我在这里找到的任何一个 http://www.cse.yorku.ca/~oz/hash.html )但现在: int Hash (char *WordParam) {
for (int i = 0; *WordParam != '\0';) {
i += *WordParam++;
}
return (i % TABLESIZE);}
编辑 2
我试图实现一些东西,它不起作用但会看看并告诉我什么是错的(我知道代码是一团糟)
编辑 3
这段代码正在正确编译和运行,但是,没有找到一些词(可能没有索引,我不知道),我正在考虑转移到我在第一条消息中提到的另一个哈希函数。
这是当前的(全功能)代码:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define TABLESIZE 4001
#define LINESIZE 2048
#define DELIMITER " \t"
typedef struct TTable {
char* Word; /* The actual word */
int Documents[5]; /* Documents Index */
int DocumentsCount; /* Number of documents where the word exist */
} TTable;
int Hash (char *Word);
void Index (TTable **HashTable, char* Word, int DocumentIndex);
int Search (TTable **HashTable, char* Word);
int mystrcmp(char *s1, char *s2);
char* Documents[] = {"foo.txt","bar.txt","foo2.txt",NULL};
int main() {
FILE* file;
TTable **HashTable
int DocumentIndex;
char Line[LINESIZE];
char* Word;
char* Tmp;
HashTable = (TTable**) malloc (sizeof(TTable)*TABLESIZE);
for (int i = 0; i < TABLESIZE; i++) {
HashTable[i] = (TTable*) NULL;
}
for (DocumentIndex = 0; Documents[DocumentIndex] != NULL; DocumentIndex++) {
file = fopen(Documents[DocumentIndex],"r");
if (file == NULL) {
fprintf(stderr, "Error%s\n", Documents[DocumentIndex]);
continue;
}
while (fgets (Line,LINESIZE,file) != NULL) {
Line[LINESIZE-1] = '\0';
Tmp = strtok (Line,DELIMITER);
do {
Word = (char*) malloc (strlen(Tmp)+1);
strcpy(Word,Tmp);
Index(HashTable,Word,DocumentIndex);
Tmp = strtok(NULL,DELIMITER);
} while (Tmp != NULL);
}
fclose(file);
}
printf("Enter the word:");
fgets(Line,100,stdin);
Line[strlen(Line)-1]='\0'; //fgets stores newline as well. so removing newline.
int i = Search(HashTable,Line);
if (i != -1) {
for (int j = 0; j < HashTable[i]->DocumentsCount; j++) {
printf("%s\n", Documents[HashTable[i]->Documents[j]]);
if ( j < HashTable[i]->DocumentsCount-1) {
printf(",");
}
}
}
else {
printf("Cant find word\n");
}
for (i = 0; i < TABLESIZE; i++) {
if (HashTable[i] != NULL) {
free(HashTable[i]->Word);
free(HashTable[i]);
}
}
return 0;
}
/* Theorem: If TableSize is prime and ? < 0.5, quadratic
probing will always find an empty slot
*/
int Search (TTable **HashTable, char* Word) {
int Aux = Hash(Word);
int OldPosition,ActualPosition;
ActualPosition = -1;
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (HashTable[ActualPosition] == NULL) {
return -1;
}
if (strcmp(Word,HashTable[ActualPosition]->Word) == 0) {
return ActualPosition;
}
}
return -1; // Word not found
}
void Index (TTable **HashTable, char* Word, int DocumentIndex) {
int Aux; //Hash value
int OldPosition, ActualPosition;
if ((ActualPosition = Search(HashTable,Word)) != -1) {
for (int j = 0; j < HashTable[ActualPosition]->DocumentsCount;j++) {
if(HashTable[ActualPosition]->Documents[j] == DocumentIndex) {
return;
}
}
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount] = DocumentIndex; HashTable[ActualPosition]->DocumentsCount++;
return;
}
ActualPosition = -1;
Aux = Hash(Word);
for (int i = 0; i < TABLESIZE; i++) {
OldPosition = ActualPosition;
ActualPosition = (Aux + i*i) % TABLESIZE;
if (OldPosition == ActualPosition) {
break;
}
if (HashTable[ActualPosition] == NULL) {
HashTable[ActualPosition] = (TTable*)malloc (sizeof(TTable));
HashTable[ActualPosition]->Word = Word;
HashTable[ActualPosition]->Documents[0] = DocumentIndex;
HashTable[ActualPosition]->DocumentsCount = 1;
return;
}
}
printf("No more free space\n");
}
int Hash (char *Word) {
int HashValue;
for (HashValue = 0; *Word != '\0';) {
HashValue += *Word++;
}
return (HashValue % TABLESIZE);
}
最佳答案
我建议你使用 TRIE用于存储内存中所有三个文件中存在的字符串的数据结构,因为哈希会消耗更多空间。
作为第一步,您应该一个一个读取所有三个文件,对于 file_i 中的每个单词,您应该执行以下操作:
完成构建 TRIE 后,检查单词是否存在将是 O(1) 操作。
如果您要使用哈希表,那么:
编辑 2 的代码中的问题:
你这边的进步很大!!!
快速浏览后,我发现了以下问题:
Don't use gets() method, use fgets() instead So replace:
gets(Line);
with the following:
fgets(Line,100,stdin); Line[strlen(Line)-1]='\0'; //fgets stores newline as well. so removing newline.
The line:
if ( j < HashTable[j]->DocumentsCount-1){
is causing segmentation fault. I think you want to access
HashTable[i]
:if ( j < HashTable[i]->DocumentsCount-1){
In the line:
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount];
You were supposed to assign some value. May be this:
HashTable[ActualPosition]->Documents[HashTable[ActualPosition]->DocumentsCount] = DocumentIndex;
Malloc returns void pointer. You should cast it to the appropriate one:
HashTable[ActualPosition] = (TTable*)malloc (sizeof(TTable));
You should also initialize the Documents array with default value while creating a new node in Hash:
for(j=0;j<5;j++)HashTable[ActualPosition]->Documents[j]=-1;
You are removing everything from your
HashTable
after finding the first word given by user. May be you wanted to place that code outside the while loop.Your while loop
while(1)
does not have any terminating condition, You should have one.
祝一切顺利 !!!
关于c - 实现二次探测和链接 - 搜索字典,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40617536/