php - 确定 mediawiki 页面中的 namespace (api quest.)

标签 php mediawiki mediawiki-api

对于初学者,我必须承认我的 PHP 技能很差(已经做了 3 天)

在过去的三天里,我对一个扩展进行了大量修改,该扩展分析了刚刚在 mediawiki 中编辑过的页面。最初的目的是解析页面以查找与数据库中其他页面名称匹配的文本,并自动链接它们。但它只会在“Main”命名空间(命名空间 0)中执行此功能。我修改了它以使用命名空间的加权白名单解析跨命名空间的链接。 (根据白名单中的位置按照先到先得的方式加权。如果白名单是 Development, Rules ,则开发优先于规则。)

我想进一步修改扩展,通过将当前页面的命名空间推到白名单数组的前面来确定当前页面的命名空间的优先级,并确定当前用户组隶属关系,从而进一步修改优先级列表。

  1. 确定用户组隶属关系
  2. 确定当前页面的命名空间 SOLVED $var=$article->getTitle()->getNamespace();
  3. 将值推到数组的前面 SOLVED array_unshift($hastack, $needle);

如果有人能告诉我至少可以解释前两件事的网站,我将不胜感激。 (到目前为止,mediawiki 社区并没有多大用处)

如果你提供了一个代码示例,请尽量保持它的愚蠢性,我是这个{{耸肩}}的新手(意思是保持代码示例 php 简单......它是可扩展的,所以总是有更好的方法,但是我不熟悉额外的模块。)

有问题的 MediaWiki 版本是 1.21(截至 7-2013 当前稳定)

注意:由于某些原因,代码显示不正确,但没有丢失任何行。

白名单的数组定义为: $wgLinkTitlesNamespaceWhitelist = array(5000, 5002, 5004, 5006,0);

主文件LinkTitles5000_body,php

if ( !defined( 'MEDIAWIKI' ) ) { die( 'Not an entry point.' ); } /* function dump($var) { error_log(print_r($var, TRUE) . "\n", 3, 'php://stderr'); }; */ class LinkTitles_5000 { static $safeTitle;

<pre><code> /// Setup function, hooks the extension's functions to MediaWiki events. public static function setup() { global $wgLinkTitlesParseOnEdit; global $wgLinkTitlesParseOnRender; global $wgHooks; if ( $wgLinkTitlesParseOnEdit ) { $wgHooks['ArticleSave'][] = 'LinkTitles_5000::onArticleSave'; }; if ( $wgLinkTitlesParseOnRender ) { $wgHooks['ArticleAfterFetchContent'][] = 'LinkTitles_5000::onArticleAfterFetchContent'; }; $wgHooks['ParserBeforeTidy'][] = 'LinkTitles_5000::removeMagicWord'; } /// This function is hooked to the ArticleSave event. /// It will be called whenever a page is about to be /// saved. public static function onArticleSave( &$article, &$user, &$text, &$summary, $minor, $watchthis, $sectionanchor, &$flags, &$status ) { // To prevent time-consuming parsing of the page whenever // it is edited and saved, we only parse it if the flag // 'minor edits' is not set. return $minor or self::parseContent( $article, $text ); } /// Called when an ArticleAfterFetchContent event occurs; this requires the /// $wgLinkTitlesParseOnRender option to be set to 'true' public static function onArticleAfterFetchContent( &$article, &$content ) { // The ArticleAfterFetchContent event is triggered whenever page content // is retrieved from the database, i.e. also for editing etc. // Therefore we access the global $action variabl to only parse the // content when the page is viewed. global $action; if ( in_array( $action, array('view', 'render', 'purge') ) ) { self::parseContent( $article, $content ); }; return true; } /// This function performs the actual parsing of the content. static function parseContent( &$article, &$text ) { // If the page contains the magic word '__NOAUTOLINKS__', do not parse // the content. $mw = MagicWord::get('MAG_LINKTITLES_TERMINOLOGY_NOAUTOLINKS'); if ( $mw -> match( $text ) ) { return true; } // Configuration variables need to be defined here as globals. global $wgLinkTitlesPreferShortTitles; global $wgLinkTitlesMinimumTitleLength; global $wgLinkTitlesParseHeadings; global $wgLinkTitlesBlackList; global $wgLinkTitlesSkipTemplates; global $wgLinkTitlesFirstOnly; global $wgLinkTitlesWordStartOnly; global $wgLinkTitlesWordEndOnly; // global $wgLinkTitlesIgnoreCase; global $wgLinkTitlesSmartMode; global $wgCapitalLinks; global $wgLinkTitlesNamespaceWhitelist; global $wgExtraNamespaces; ( $wgLinkTitlesWordStartOnly ) ? $wordStartDelim = '\b' : $wordStartDelim = ''; ( $wgLinkTitlesWordEndOnly ) ? $wordEndDelim = '\b' : $wordEndDelim = ''; // ( $wgLinkTitlesIgnoreCase ) ? $regexModifier = 'i' : $regexModifier = ''; // To prevent adding self-references, we now // extract the current page's title. $myTitle = $article->getTitle()->getText(); ( $wgLinkTitlesPreferShortTitles ) ? $sort_order = 'ASC' : $sort_order = 'DESC'; ( $wgLinkTitlesFirstOnly ) ? $limit = 1 : $limit = -1; if ( $wgLinkTitlesSkipTemplates ) { $templatesDelimiter = '{{.+}}'; } else { $templatesDelimiter = '{{[^|]+?}}|{{.+\|'; }; // Build a regular expression that will capture existing wiki links ("[[...]]"), // wiki headings ("= ... =", "== ... ==" etc.), // urls ("http://example.com", "[http://example.com]", "[http://example.com Description]", // and email addresses ("mail@example.com"). // Since there is a user option to skip headings, we make this part of the expression // optional. Note that in order to use preg_split(), it is important to have only one // capturing subpattern (which precludes the use of conditional subpatterns). ( $wgLinkTitlesParseHeadings ) ? $delimiter = '' : $delimiter = '=+.+?=+|'; $urlPattern = '[a-z]+?\:\/\/(?:\S+\.)+\S+(?:\/.*)?'; $delimiter = '/(' . $delimiter . '\[\[.*?\]\]|' . $templatesDelimiter . '|\[' . $urlPattern . '\s.+?\]|'. $urlPattern . '(?=\s|$)|(?<=\b)\S+\@(?:\S+\.)+\S+(?=\b))/i'; $black_list = str_replace( '_', ' ', '("' . implode( '", "',$wgLinkTitlesBlackList ) . '")' ); // Depending on the global setting $wgCapitalLinks, we need // different callback functions further down. if ( $wgCapitalLinks ) { $callBack = "LinkTitles_5000::CallBackCaseInsensitive"; } else { $callBack = "LinkTitles_5000::CallBackCaseSensitive"; } # Added to suuport $wgLinkTitlesNamespaceWhitelist foreach ($wgLinkTitlesNamespaceWhitelist as $LT_namespace){ # Create the link part reflecting NameSpace: # if namespace is main (0) set to empty string if ($LT_namespace === 0){ $LT_namespacePart = ""; } else { $LT_namespacePart = str_replace('_', ' ', $wgExtraNamespaces[(int)$LT_namespace]); $LT_namespacePart = $LT_namespacePart . ":"; } # === // Build an SQL query and fetch all page titles ordered // by length from shortest to longest. // Only titles from 'normal' pages (namespace uid = 0) // are returned. $dbr = wfGetDB( DB_SLAVE ); # modified to suuport $wgLinkTitlesNamespaceWhitelist # 'page_namespace = 0' becomes 'page_namespace = ' . $LT_namespace, # === $res = $dbr->select( $wgDBprefix . 'page', 'page_title, page_namespace', array( 'page_namespace = ' . strval($LT_namespace), 'CHAR_LENGTH(page_title) >= ' . $wgLinkTitlesMinimumTitleLength, 'page_title NOT IN ' . $black_list, ), __METHOD__, array( 'ORDER BY' => 'CHAR_LENGTH(page_title) ' . $sort_order ) ); // Iterate through the page titles foreach( $res as $row ) { // Page titles are stored in the database with spaces // replaced by underscores. Therefore we now convert // the underscores back to spaces. $title = str_replace('_', ' ', $row->page_title); if ( $title != $myTitle ) { LinkTitles_5000::$safeTitle = str_replace( '/', '\/', $title ); # add this to skip the function if more than 1 level of sub pages # Thus if 0 or 1 "\/" is found we continue and process the entry # if two or more are found we go AARRRRRGGGGHHHHH and skip it! if (substr_count(LinkTitles_5000::$safeTitle, '\/') >1) { continue; } # adding this to allow for sub pages to be broken into their parts $LT5000_pos = strpos(LinkTitles_5000::$safeTitle, "\/"); if ($LT5000_pos !== false){ $LT5000_front = substr(LinkTitles_5000::$safeTitle, 0, $LT5000_pos); $LT5000_back = substr(LinkTitles_5000::$safeTitle, $LT5000_pos+1); LinkTitles_5000::$safeTitle = substr($title, $LT5000_pos+1); } else { $LT5000_back = ''; $LT5000_front = LinkTitles_5000::$safeTitle;; } // split the string by [[...]] groups // credits to inhan @ StackOverflow for suggesting preg_split // see http://stackoverflow.com/questions/10672286 $arr = preg_split( $delimiter, $text, -1, PREG_SPLIT_DELIM_CAPTURE ); // Depending on the global configuration setting $wgCapitalLinks, // the title has to be searched for either in a strictly case-sensitive // way, or in a 'fuzzy' way where the first letter of the title may // be either case. if ( $wgCapitalLinks ) { $searchTerm = '((?i)' . LinkTitles_5000::$safeTitle[0] . '(?-i)' . substr(LinkTitles_5000::$safeTitle, 1) . ')'; } else { $searchTerm = '(' . LinkTitles_5000::$safeTitle . ')'; } $LT5000_out = "[[" . $LT_namespacePart . $title . "|"; for ( $i = 0; $i < count( $arr ); $i+=2 ) { // even indexes will point to text that is not enclosed by brackets $arr[$i] = preg_replace( '/(?<![\:\.\@\/\?\&])' . $wordStartDelim . $searchTerm . $wordEndDelim . '/', $LT5000_out.'$1]]', $arr[$i], $limit, $count ); if (( $limit >= 0 ) && ( $count > 0 )) { break; }; }; $text = implode( '', $arr ); // If smart mode is turned on, the extension will perform a second // pass on the page and add links with aliases where the case does // not match. if ($wgLinkTitlesSmartMode) { // split the string by [[...]] groups // credits to inhan @ StackOverflow for suggesting preg_split // see http://stackoverflow.com/questions/10672286 $arr = preg_split( $delimiter, $text, -1, PREG_SPLIT_DELIM_CAPTURE ); for ( $i = 0; $i < count( $arr ); $i+=2 ) { // even indexes will point to text that is not enclosed by brackets $arr[$i] = preg_replace_callback( '/(?<![\:\.\@\/\?\&])' . $wordStartDelim . '(' . LinkTitles_5000::$safeTitle . ')' . $wordEndDelim . '/i', $callBack, $arr[$i], $limit, $count ); if (( $limit >= 0 ) && ( $count > 0 )) { break; }; }; $text = implode( '', $arr ); } }; // if $title != $myTitle }; // foreach $res as $row }; // foreach $wgLinkTitlesNamespaceWhitelist as $LT_namespace return true; } static function CallBackCaseInsensitive($matches) { if ($LT5000_pos !== false){ # this if a / was found in the first place $LT5000_call_out = $LT_namespacePart . $LT5000_front . '/' . $LT5000_back; } else { # this if there was no slash $LT5000_call_out = $LT_namespacePart . $matches[0]; } if ( strcmp(substr(LinkTitles_5000::$safeTitle, 1), substr($matches[0], 1)) == 0 ) { return '[[' . $LT5000_call_out . '|]]'; } else { return '[[' . $LT5000_call_out . '|' . $matches[0] . ']]'; } } static function CallBackCaseSensitive($matches) { if ($LT5000_pos !== false){ # this if a / was found in the first place $LT5000_call_out = $LT_namespacePart . $LT5000_front . '/' . $LT5000_back; } else { # this if there was no slash $LT5000_call_out = $LT_namespacePart . $matches[0]; } if ( strcmp(substr(LinkTitles_5000::$safeTitle, 0), substr($matches[0], 0)) == 0 ) { return '[['. $LT5000_call_out . '|]]'; } else { return '[[' . $LT5000_call_out . '|' . $matches[0] . ']]'; } } static function removeMagicWord( &$parser, &$text ) { $mw = MagicWord::get('MAG_LINKTITLES_TERMINOLOGY_NOAUTOLINKS'); $mw -> matchAndRemove( $text ); return true; } } </code></pre>

以及模块加载器函数 LinkTitles_5000.php:

if ( !defined( 'MEDIAWIKI' ) ) { die( 'Not an entry point.' ); }

<pre><code>/* error_reporting(E_ALL); ini_set('display_errors', 'On'); ini_set('error_log', 'php://stderr'); $wgMainCacheType = CACHE_NONE; $wgCacheDirectory = false; */ // Configuration variables $wgLinkTitlesPreferShortTitles = false; $wgLinkTitlesMinimumTitleLength = 3; $wgLinkTitlesParseHeadings = false; $wgLinkTitlesParseOnEdit = true; $wgLinkTitlesParseOnRender = false; $wgLinkTitlesSkipTemplates = false; $wgLinkTitlesBlackList = array(); $wgLinkTitlesFirstOnly = false; $wgLinkTitlesWordStartOnly = true; $wgLinkTitlesWordEndOnly = true; $wgLinkTitlesSmartMode = true; $wgLinkTitlesNamespaceWhitelist = array(); </code></pre> <p>$wgExtensionCredits['parserhook'][] = array( 'path' => <strong>FILE</strong>, 'name' => 'LinkTitles_5000', 'author' => '[<a href="https://www.mediawiki.org/wiki/User:Bovender" rel="noreferrer noopener nofollow">https://www.mediawiki.org/wiki/User:Bovender</a> Daniel Kraus]', 'url' => '<a href="https://www.mediawiki.org/wiki/Extension:LinkTitles" rel="noreferrer noopener nofollow">https://www.mediawiki.org/wiki/Extension:LinkTitles</a>', 'version' => '2.2.0', 'descriptionmsg' => 'linktitles-desc' );</p> <p>$wgExtensionMessagesFiles['LinkTitles_5000'] = dirname( <strong>FILE</strong> ) . '/LinkTitles_5000.i18n.php'; $wgExtensionMessagesFiles['LinkTitlesMagic_5000'] = dirname( <strong>FILE</strong> ) . '/LinkTitles_5000.i18n.magic.php'; $wgAutoloadClasses['LinkTitles_5000'] = dirname( <strong>FILE</strong> ) . '/LinkTitles_5000.body.php'; $wgExtensionFunctions[] = 'LinkTitles_5000::setup';</p> <pre><code>// vim: ts=2:sw=2:noet </code></pre>

最佳答案

很久以前就忘记了这个问题,但事实证明 MediaWiki 有一个功能可以解决这个问题

$myNamespace = $article->getTitle()->getNamespace();

由于文章对象默认传递给 Hook 到 on_save 进程的扩展,这将获取命名空间作为数值。

关于php - 确定 mediawiki 页面中的 namespace (api quest.),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17478630/

相关文章:

mediawiki-api - 如何使用 MediaWiki API 获取所有维基百科文章标题?

php - Base64解码返回空白

php - Visual Studio Code 主题编辑 PHP

mysql - 将 Wikipedia 导入本地 Media Wiki 会创建充满模板引用的页面

mediawiki - 如何自定义 MediaWiki 可视化编辑器

python - wikitools、维基百科和 python

javascript - $getJSON 和 for 循环问题

php - 条件语句 - Php Mysqli

php - 从 mysql 查询中提取第一个图像标签

php - 列出一个类别和子类别中的所有维基百科文章