c - 了解代码文件中的流重定向 - 从 `stdin` 读取

标签 c

此问题基于以下代码:http://nlp.stanford.edu/projects/glove/

以下代码的行为符合我的预期。它回显来自 stdin 的用户输入。

stdin: The standard input stream is the default source of data for applications. In most systems, it is usually directed by default to the keyboard.

输入文本并按回车键,该文本回显到控制台。正常,符合预期。

// _CRT_SECURE_NO_WARNINGS:
#pragma warning(disable : 4996)

#include <stdio.h>
#include <stdlib.h>

int main()
{

// Char as int:
int ch;

// File Pointer:
FILE *fid;

// Open the File: Stream
fid = fopen("<Path to simple text file>/text.txt", "r");

// Loop through Chars:
while (1)
{

// Check valid Stream:
if (fid == NULL)
{
printf("Stream Error: File was not opened!\n");
break;
}

// If EOF:
if (feof(fid))
{
break;
}

// Get C:
ch = fgetc(fid);

// Print C:
printf("%c", ch);
}

// Close the File:
fclose(fid);


// Open the File: Stream
fid = stdin;

// Loop through Chars:
while (1)
{

// Check valid Stream:
if (fid == NULL)
{
printf("Stream Error: File was not opened!\n");
break;
}

// If EOF:
if (feof(fid))
{
break;
}

// Get C:
ch = fgetc(fid);

// Print C:
printf("%c", ch);
}


int i = 0;

return i;
}

示例源代码: http://nlp.stanford.edu/projects/glove/ - 具体来自 coocur.c 代码文件的第 301 行

在这段代码中:

fid = fopen(vocab_file,"r");
if(fid == NULL) {fprintf(stderr,"Unable to open vocab file %s.\n",vocab_file); return 1;}
while(fscanf(fid, format, str, &id) != EOF) hashinsert(vocab_hash, str, ++j); // Here id is not used: inserting vocab words into hash table with their frequency rank, j
fclose(fid);
vocab_size = j;
j = 0;
if(verbose > 1) fprintf(stderr, "loaded %lld words.\nBuilding lookup table...", vocab_size);

/* Build auxiliary lookup table used to index into bigram_table */
lookup = (long long *)calloc( vocab_size + 1, sizeof(long long) );
if (lookup == NULL) {
    fprintf(stderr, "Couldn't allocate memory!");
    return 1;
}
lookup[0] = 1;
for(a = 1; a <= vocab_size; a++) {
    if((lookup[a] = max_product / a) < vocab_size) lookup[a] += lookup[a-1];
    else lookup[a] = lookup[a-1] + vocab_size;
}
if(verbose > 1) fprintf(stderr, "table contains %lld elements.\n",lookup[a-1]);

/* Allocate memory for full array which will store all cooccurrence counts for words whose product of frequency ranks is less than max_product */
bigram_table = (real *)calloc( lookup[a-1] , sizeof(real) );
if (bigram_table == NULL) {
    fprintf(stderr, "Couldn't allocate memory!");
    return 1;
}

fid = stdin; // <<<--- STDIN Stream Redirect
sprintf(format,"%%%ds",MAX_STRING_LENGTH);
sprintf(filename,"%s_%04d.bin",file_head, fidcounter);
foverflow = fopen(filename,"w");
if(verbose > 1) fprintf(stderr,"Processing token: 0");

/* For each token in input stream, calculate a weighted cooccurrence sum within window_size */
while (1) {
    if(ind >= overflow_length - window_size) { // If overflow buffer is (almost) full, sort it and write it to temporary file
        qsort(cr, ind, sizeof(CREC), compare_crec);
        write_chunk(cr,ind,foverflow);
        fclose(foverflow);
        fidcounter++;
        sprintf(filename,"%s_%04d.bin",file_head,fidcounter);
        foverflow = fopen(filename,"w");
        ind = 0;
    }
    flag = get_word(str, fid); // <<<--- Reading from the Vocab, not STDIN
    if(feof(fid)) break;
    if(flag == 1) {j = 0; continue;} // Newline, reset line index (j)
    counter++;
    if((counter%100000) == 0) if(verbose > 1) fprintf(stderr,"\033[19G%lld",counter);
    htmp = hashsearch(vocab_hash, str); // <<<--- Using the str that was read in the function: 'get_word'
    if (htmp == NULL) continue; // Skip out-of-vocabulary words
    w2 = htmp->id; // Target word (frequency rank)
    for(k = j - 1; k >= ( (j > window_size) ? j - window_size : 0 ); k--) { // Iterate over all words to the left of target word, but not past beginning of line
        w1 = history[k % window_size]; // Context word (frequency rank)
        if ( w1 < max_product/w2 ) { // Product is small enough to store in a full array
            bigram_table[lookup[w1-1] + w2 - 2] += 1.0/((real)(j-k)); // Weight by inverse of distance between words
            if(symmetric > 0) bigram_table[lookup[w2-1] + w1 - 2] += 1.0/((real)(j-k)); // If symmetric context is used, exchange roles of w2 and w1 (ie look at right context too)
        }
        else { // Product is too big, data is likely to be sparse. Store these entries in a temporary buffer to be sorted, merged (accumulated), and written to file when it gets full.
            cr[ind].word1 = w1;
            cr[ind].word2 = w2;
            cr[ind].val = 1.0/((real)(j-k));
            ind++; // Keep track of how full temporary buffer is
            if(symmetric > 0) { // Symmetric context
                cr[ind].word1 = w2;
                cr[ind].word2 = w1;
                cr[ind].val = 1.0/((real)(j-k));
                ind++;
            }
        }
    }

我想知道,如何在方法中将一个词分配给str:flag = get_word(str, fid); 在流被更改为 stdin,然后在两行之后使用:htmp = hashsearch(vocab_hash, str);

此代码对大型语料库进行了数百万次迭代,用户无需坐在那里手动输入每个单词。

fid = stdin; 流更改之后,如果有人能解释这是如何发生的,我将不胜感激。

最佳答案

对某些人来说很简单,但对其他人来说并非如此......

stdin是默认的输入流,因此在代码中可以使用 stdin 直接访问多变的。所以当一个人看到的时候,我现在有几次:

FILE *fid;
fid = stdin;

stdin流已被重定向:如果此流不是默认流,则为“某处”。通常在大多数机器上,默认是键盘。

在线:301 fid = fopen(vocab_file,"r"); vocab 文件成为流数据源,由 fopen 返回功能。文件被读取和处理。

在线:304 流已关闭:fclose(fid);

在线:329 fid = stdin; stdin被指定为 fid 的输入流.

从那里开始,没有 Stream Change 的迹象,但是有分配给 str ,这是来自其中一个文本文件和方法:get_word分配 str从语料库...

命令行输入就是答案:-overflow-file tempoverflow cooccurrences.bin

./cooccur -verbose 2 -symmetric 0 -window-size 10 -vocab-file vocab.txt -memory 8.0 -overflow-file tempoverflow < corpus.txt > cooccurrences.bin

使用:cplusplus.com

Standard input stream

The standard input stream is the default source of data for applications. In most systems, it is usually directed by default to the keyboard.

stdin can be used as an argument for any function that expects an input stream (FILE*) as one of its parameters, like fgets or fscanf.

Although it is commonly assumed that the source of data for stdin is going to be a keyboard, this may not be the case even in regular console systems, since stdin can generally be redirected on most operating systems at the time of invoking the application. For example, many systems, among them DOS/Windows and most UNIX shells, support the following command syntax:

myapplication < example.txt

to use the content of the file example.txt as the primary source of data for myapplication instead of the console keyboard.

It is also possible to redirect stdin to some other source of data from within a program by using the freopen function.

If stdin is known to not refer to an interactive device, the stream is fully buffered. Otherwise, it is library-dependent whether the stream is line buffered or not buffered by default (see setvbuf).

好了,stdin流由命令行参数重定向:-overflow-file tempoverflow < corpus.txt

结果:corpus.txtstdin 的重定向数据源直播!

同样值得注意的是,cooccurrences.binstdout 的重定向数据源流 - 小瓶线:232 fout = stdout;并在线写入:270 fwrite(&old, sizeof(CREC), 1, fout);

更多信息:“Standard Input and Output Redirection

注意:如果您想运行此代码,请记住将控制台应用程序设置为 64 位 - 它不会以其他方式分配内存!

关于c - 了解代码文件中的流重定向 - 从 `stdin` 读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/41907459/

相关文章:

android - 无法定位符号

c - 上面的 C 代码有效吗?

c - C中结构体的实际大小是多少

C:使用缓冲区扫描来自终端的输入

C++ 逐帧视频处理

c - 使用 valgrind 在 strstr() 中读取大小 1 无效

c - 向 PGRP 发送信号

c - C中的FOR循环,条件部分

c++ - 如何在 C 中 stub 套接字?

c - 带有 libssl 的 EMSA_PSS_ENCODE