我使用此代码进行正则表达式替换 pcre2图书馆:
PCRE2_SIZE outlengthptr=256; //this line
PCRE2_UCHAR* output_buffer; //this line
output_buffer=(PCRE2_UCHAR*)malloc(outlengthptr); //this line
uint32_t rplopts=PCRE2_SUBSTITUTE_GLOBAL;
int ret=pcre2_substitute(
re1234, /*Points to the compiled pattern*/
subject, /*Points to the subject string*/
subject_length, /*Length of the subject string*/
0, /*Offset in the subject at which to start matching*/
rplopts, /*Option bits*/
0, /*Points to a match data block, or is NULL*/
0, /*Points to a match context, or is NULL*/
replace, /*Points to the replacement string*/
replace_length, /*Length of the replacement string*/
output_buffer, /*Points to the output buffer*/
&outlengthptr /*Points to the length of the output buffer*/
);
但我似乎不知道如何正确定义output_buffer
和指向其长度的指针(outlengthptr
)。
当我为 outlengthptr
提供固定值时,该代码有效,但它保持固定,即它不会更改为 output_buffer
的新长度。但根据pcre2_substitue()
specification它应该更改为 output_buffer
的新长度:
The length, startoffset and rlength values are code units, not characters, as is the contents of the variable pointed at by
outlengthptr
, which is updated to the actual length of the new string.
问题是:
- 当我将
outlengthptr
设置为固定值时,最终字符串会按固定长度截断。 - 如果我不初始化变量
outlengthptr
,就会出现段错误。
这是函数的原型(prototype):
int pcre2_substitute(const pcre2_code *code, PCRE2_SPTR subject, PCRE2_SIZE length, PCRE2_SIZE startoffset, uint32_t options, pcre2_match_data *match_data, pcre2_match_context *mcontext, PCRE2_SPTR replacement, PCRE2_SIZE rlength, PCRE2_UCHAR *outputbuffer, PCRE2_SIZE *outlengthptr);
最佳答案
pcre2api page说如下(强调我的):
The function returns the number of replacements that were made. This may be zero if no matches were found, and is never greater than 1 unless
PCRE2_SUBSTITUTE_GLOBAL
is set. In the event of an error, a negative error code is returned. Except forPCRE2_ERROR_NOMATCH
(which is never returned), any errors frompcre2_match()
or the substring copying functions are passed straight back.PCRE2_ERROR_BADREPLACEMENT
is returned for an invalid replacement string (unrecognized sequence following a dollar sign), andPCRE2_ERROR_NOMEMORY
is returned if the output buffer is not big enough.
因此,从一个初始缓冲区开始,它应该容纳大部分结果 - 不要太大也不要太小。这取决于您的应用程序。
例如,您可以尝试从输入字符串长度的 120% 开始作为启发,因为这对于最常见的正则表达式替换用法来说似乎是一个合理的选择。
然后,使用该缓冲区调用该函数,并向其传递其大小。
- 如果您得到肯定的结果(或零),那么您就完成了。
- 如果收到
PCRE2_ERROR_NOMEMORY
,请将缓冲区大小加倍并重试(根据需要多次重复此步骤) - 如果您收到不同的错误代码,请将其作为真正的错误情况进行相应处理。
关于c++ - 如何初始化指向输出缓冲区长度的指针?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33981841/