php - 无法存储 document_id

我有 tb_sentence 表:

=========================================================================
| id_row | document_id | sentence_id |          sentence_content        |
=========================================================================
|   1    |     1       |    0        |  Introduction to Data Mining.    |
|   2    |     1       |    1        |  Describe how data mining.       |
|   3    |     2       |    0        |  The boss is right.              |
=========================================================================

我想标记 sentence_content，所以 tb_tokens 表将包含:

==========================================================================
| tokens_id | tokens_word  | tokens_freq | sentence_id  | document_id    |
==========================================================================
|     1     | Introduction |        1    |       0      |       1        |
|     2     | to           |        1    |       0      |       1        |
|     3     | Data         |        1    |       0      |       1        |
|     4     | Mining       |        1    |       0      |       1        |
|     5     | Describe     |        1    |       1      |       1        |
etc...

这是我的代码:

$sentence_clean = array();
$q1 = mysql_query("SELECT document_id FROM tb_sentence ORDER BY document_id ") or die(mysql_error());
while ($row1 = mysql_fetch_array($q1)) {
    $doc_id[] = $row1['document_id'];
}
$q2 = mysql_query('SELECT sentence_content, sentence_id, document_id FROM tb_sentence ') or die(mysql_error());
while ($row2 = mysql_fetch_array($q2)) {
    $sentence_clean[$row2['document_id']][] = $row2['sentence_content'];
}
foreach ($sentence_clean as $kal) {
    if (trim($kal) === '')
        continue;
    tokenizing($kal);
}

分词的功能是:

function tokenizing($sentence) {
    foreach ($sentence as $sentence_id => $sentences) {
        $symbol = array(".", ",", "\\", "-", "\"", "(", ")", "<", ">", "?", ";", ":", "+", "%", "\r", "\t", "\0", "\x0B");
        $spasi = array("\n", "/", "\r");
        $replace = str_replace($spasi, " ", $sentences);
        $cleanSymbol = str_replace($symbol, "", $replace);
        $quote = str_replace("'", "\'", $cleanSymbol);
        $element = explode(" ", trim($quote));
        $elementNCount = array_count_values($element);

        foreach ($elementNCount as $word => $freq) {
            if (ereg("([a-z,A-Z])", $word)) {
                $query = mysql_query(" INSERT INTO tb_tokens VALUES ('','$word','$freq','$sentence_id', '$doc_id')");
            }
        }
    }
}

问题是 document_id 无法读取，也无法插入到 tb+tokens 表中。如何调用那些 document_id ？谢谢:)

编辑问题: 每个单词(标记化的结果)都有 document_id 和 sentence_id。我的问题是无法调用 document_id。如何在每个单词中同时调用 sentence_id 和 document_id ？

最佳答案

我认为你不需要这些代码:

$q1 = mysql_query("SELECT document_id FROM tb_sentence ORDER BY document_id ") or die(mysql_error());
while ($row1 = mysql_fetch_array($q1)) {
    $doc_id[] = $row1['document_id'];
}

从未使用过 $doc_id 数组

if (trim($kal) === '')
        continue;

$kal是一个数组，不需要裁剪

$sentence_clean[$row2['document_id']][] = $row2['sentence_content'];

因为你要记录sentence_id，所以应该是$row2['sentence_id']而不是[]

(当然你要确定，同一个document_id中不会有相同的sentence_id，否则你应该concat它)

这是我的一些更正:

$sentence_clean = array();
$q2 = mysql_query('SELECT sentence_content, sentence_id, document_id FROM tb_sentence ') or die(mysql_error());
while ($row2 = mysql_fetch_array($q2)) {
    $sentence_clean[$row2['document_id']][$row2['sentence_id']] = $row2['sentence_content'];
}

foreach ($sentence_clean as $doc_id => $kal) {
    tokenizing($kal, $doc_id);
}

function tokenizing($sentence, $doc_id) {
    foreach ($sentence as $sentence_id => $sentences) {
        $symbol = array(".", ",", "\\", "-", "\"", "(", ")", "<", ">", "?", ";", ":", "+", "%", "\r", "\t", "\0", "\x0B");
        $spasi = array("\n", "/", "\r");
        $replace = str_replace($spasi, " ", $sentences);
        $cleanSymbol = str_replace($symbol, "", $replace);
        $quote = str_replace("'", "\'", $cleanSymbol);
        $element = explode(" ", trim($quote));
        $elementNCount = array_count_values($element);

        foreach ($elementNCount as $word => $freq) {
            if (ereg("([a-z,A-Z])", $word)) {
                $query = mysql_query(" INSERT INTO tb_tokens VALUES ('','$word','$freq','$sentence_id', '$doc_id')");
            }
        }
    }
}

我将 document_id 解析为函数

关于php - 无法存储 document_id，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11776974/

php - 无法存储 document_id

上一篇：MySQL替换单个反斜杠

下一篇：php - jquery 与 PHP mySQL 聊天