你好 friend 有一个小问题。我只需要提取文本“任何人”中的单词。
我尝试使用 strtok()、strstr() 检索单词。一些正则表达式,但只能提取一些单词。
由于单词可以伴随的字符和符号的数量,问题很复杂。
示例文本中必须提取的单词。这是一个示例文本:
Main article: our 46,000 required, !but (1947-2011) mail@server.com March 8, 2014 Gutenberg's 34-DE 'a' 3,1415 Us: @unknown n go http://google.com or www.google.com and http://www.google.com (r) The 509th "composite" and; C-54 #dog v4.0 ¿as is done? ¿article... agriculture? x ¿cat? now! Hi!! (87 meters).
Sample text, for testing.
提取文本的结果应该是:
Main article our required but March Gutenberg's a go or and The composite and dog as is done article agriculture cat now Hi meters
Sample text for testing
我写的第一个函数是为了方便工作
function PreText($text){
$text = str_replace("\n", ".", $text);
$text = str_replace("\r", ".", $text);
$text = str_replace("'", "", $text);
$text = str_replace("?", "", $text);
$text = str_replace("¿", "", $text);
$text = str_replace("(", "", $text);
$text = str_replace(")", "", $text);
$text = str_replace('"', "", $text);
$text = str_replace(';', "", $text);
$text = str_replace('!', "", $text);
$text = str_replace('<', "", $text);
$text = str_replace('>', "", $text);
$text = str_replace('#', "", $text);
$text = str_replace(",", "", $text);
$text = str_replace(".c", "", $text);
$text = str_replace(".C", "", $text);
return $text;
}
拆分功能:
function SplitWords($text){
$words = explode(" ", $text);
$ContWords = count($words);
for ($i = 0; $i < $ContWords; $i++){
if (ctype_alpha($words[$i])) {
$NewText .= $words[$i].", ";
}
}
return $NewText;
}
程序:
<?
include_once ('functions.php');
$text = "Main article: our 46,000 ...";
$text = PreText($text);
$text = SplitWords($text);
echo $text;
?>
是不是代码长了点。感谢您的帮助。
最佳答案
如果我没理解错的话,你想从字符串中删除所有非字母。我会用 preg_replace
$text = "Main article: our 46,000...";
$text = preg_replace("/[^a-zA-Z' ]/","",$text);
这应该删除所有不是字母、撇号或空格的内容。
关于php - 使用php从文本中提取单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27518884/