我试图用preg_match_all()返回的$matches数组突出显示主题字符串.让我从一个例子开始:
preg_match_all("/(.)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
这将返回:
Array
(
[0] => Array
(
[0] => Array
(
[0] => a
[1] => 0
)
[1] => Array
(
[0] => a
[1] => 0
)
)
[1] => Array
(
[0] => Array
(
[0] => b
[1] => 1
)
[1] => Array
(
[0] => b
[1] => 1
)
)
[2] => Array
(
[0] => Array
(
[0] => c
[1] => 2
)
[1] => Array
(
[0] => c
[1] => 2
)
)
)
在这种情况下,我想要做的是突出显示整体消耗的数据和每个反向引用.
输出应如下所示:
<span class="match0">
<span class="match1">a</span>
</span>
<span class="match0">
<span class="match1">b</span>
</span>
<span class="match0">
<span class="match1">c</span>
</span>
另一个例子:
preg_match_all("/(abc)/", "abc", $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
应该返回:
<span class="match0"><span class="match1">abc</span></span>
我希望这很清楚.
我想强调整体消费数据并突出显示每个反向引用.
提前致谢.如果有任何不清楚的地方,请询问.
注意:它不能破坏html.正则表达式和输入字符串都是代码未知的并且是完全动态的.因此搜索字符串可以是html,匹配的数据可以包含类似html的文本,但不包含.
最佳答案 这似乎对我迄今为止所抛出的所有例子都是正确的.请注意,我已经从HTML-mangling部分中删除了抽象突出显示部分,以便在其他情况下重用:
<?php
/**
* Runs a regex against a string, and return a version of that string with matches highlighted
* the outermost match is marked with [0]...[/0], the first sub-group with [1]...[/1] etc
*
* @param string $regex Regular expression ready to be passed to preg_match_all
* @param string $input
* @return string
*/
function highlight_regex_matches($regex, $input)
{
$matches = array();
preg_match_all($regex, $input, $matches, PREG_OFFSET_CAPTURE | PREG_SET_ORDER);
// Arrange matches into groups based on their starting and ending offsets
$matches_by_position = array();
foreach ( $matches as $sub_matches )
{
foreach ( $sub_matches as $match_group => $match_data )
{
$start_position = $match_data[1];
$end_position = $start_position + strlen($match_data[0]);
$matches_by_position[$start_position]['START'][] = $match_group;
$matches_by_position[$end_position]['END'][] = $match_group;
}
}
// Now proceed through that array, annotoating the original string
// Note that we have to pass through BACKWARDS, or we break the offset information
$output = $input;
krsort($matches_by_position);
foreach ( $matches_by_position as $position => $matches )
{
$insertion = '';
// First, assemble any ENDING groups, nested highest-group first
if ( is_array($matches['END']) )
{
krsort($matches['END']);
foreach ( $matches['END'] as $ending_group )
{
$insertion .= "[/$ending_group]";
}
}
// Then, any STARTING groups, nested lowest-group first
if ( is_array($matches['START']) )
{
ksort($matches['START']);
foreach ( $matches['START'] as $starting_group )
{
$insertion .= "[$starting_group]";
}
}
// Insert into output
$output = substr_replace($output, $insertion, $position, 0);
}
return $output;
}
/**
* Given a regex and a string containing unescaped HTML, return a blob of HTML
* with the original string escaped, and matches highlighted using <span> tags
*
* @param string $regex Regular expression ready to be passed to preg_match_all
* @param string $input
* @return string HTML ready to display :)
*/
function highlight_regex_as_html($regex, $raw_html)
{
// Add the (deliberately non-HTML) highlight tokens
$highlighted = highlight_regex_matches($regex, $raw_html);
// Escape the HTML from the input
$highlighted = htmlspecialchars($highlighted);
// Substitute the match tokens with desired HTML
$highlighted = preg_replace('#\[([0-9]+)\]#', '<span class="match\\1">', $highlighted);
$highlighted = preg_replace('#\[/([0-9]+)\]#', '</span>', $highlighted);
return $highlighted;
}
注意:正如hakra在聊天中向我指出的那样,如果正则表达式中的一个子组可以在一个整体匹配中多次出现(例如’/ a(b | c)/’),preg_match_all只会告诉你最后一个那些匹配 – 所以highlight_regex_matches(‘/ a(b | c)/’,’abc’)返回'[0] ab [1] c [/ 1] [/ 0]’不'[0] a [1] b [/ 1] [1] c [/ 1] [/ 0]’正如您所期望/想要的那样.之外的所有匹配组仍然可以正常工作,因此highlight_regex_matches(‘/ a((b | c))/’,’abc’)给出'[0] a [1] b [2] c [/ 2] [ / 1] [/ 0]’这仍然是正则表达式如何匹配的一个很好的指示.