preg_match_all syntax problem

preg_match_all syntax problem - php

Having trouble with preg_match syntax
with in a page I need to find anything like
$first = '/>http:\/\/www.(.*?)\/(.*?)\</';
$second = '/="http:\/\/www.(.*?)\/(.*?)"/';
How could I combine the two?
Something like
$regex = '/(?="|>)http:\/\/www.(.*?)/(.*?)(?"|\<)/';
Sorry not very good at this.

This looks about right to me:
/(?:="|>)http:\/\/www\.(.*?)\/(.*?)["<]/i
Notice a few minor corrections: Your non-capturing group syntax was a little off (it should be (?:pattern) instead of (?pattern)), and you also needed to escape the . and /.
I'm also not sure the (.*?)\/(.*?) is doing exactly what you think it is; I'd probably just replace that with (.*?) unless you want to require a / character.

Here is a funny thought.
Use /(?:(=")|>)http:\/\/www\.(.*?)\/(.*?)(?(1)"|<)/sg using a looping find next search. Extracting variables $2 and $3 each time. This uses a conditional.
Or, use /(?|(?<==")http:\/\/www\.(.*?)\/(.*?)(?=")|(?<=>)http:\/\/www\.(.*?)\/(.*?)(?=<))/sg in a match all. This uses branch reset. The array will acumulate as pairs ($cnt++ % 2).
Depends on what you mean by combining.
A perl test case:
use strict;
use warnings;
my $str = '
<tag asdf="http://www.some.com/directory"/>
<dadr>http://www.adif.com/dir</dadr>
';
while ( $str =~ /(?:(=")|>)http:\/\/www\.(.*?)\/(.*?)(?(1)"|<)/sg )
{
print "'$2' '$3'\n";
}
print "--------------\n";
my #parts = $str =~ /(?|(?<==")http:\/\/www\.(.*?)\/(.*?)(?=")|(?<=>)http:\/\/www\.(.*?)\/(.*?)(?=<))/sg;
my $cnt = 0;
for (#parts)
{
print "'$_' ";
if ($cnt++ % 2) {
print "\n";
}
}
__END__
Output:
'some.com' 'directory'
'adif.com' 'dir'
--------------
'some.com' 'directory'
'adif.com' 'dir'

Related

Regex use replaced string replacement

I have strings like this
[Ljava.lang.String;
[Ldummy.class.Here;
[Lanother.unknown.Class;
What regex should i use to replace [L and ; with <span>,[]</span>
And make it look like this
<span>java.lang.String[]</span>
<span>dummy.class.Here[]</span>
<span>another.unknown.Class[]</span>
What i want is to make java array class string representation more human friendly
I've heard about $1 or something like that, but i couldn't find more information as i don't know what is it

$strings = "[Ljava.lang.String;
[Ldummy.class.Here;
[Lanother.unknown.Class;";
$strings = preg_replace('/\[L([A-Za-z\.]+);/', '<span>$1[]</span>', $strings);
echo $strings;
Output:
$ php foo.php
<span>java.lang.String[]</span>
<span>dummy.class.Here[]</span>
<span>another.unknown.Class[]</span>

If you want to use plain old PHP for this rather than a regex, here is a simple snippet that will do exactly what you need - and you can modify it without having to sort through regex that makes little sense to you:
<?php
$stringArray=array(
'[Ljava.lang.String;',
'[Ldummy.class.Here;',
'[Lanother.unknown.Class;'
);
foreach($stringArray as $val)
{
$output=$val;
if($val[0].$val[1]=='[L')
{
$output="<span>".substr($val,2);
}
if(substr($output,-1)==';')
{
$output=substr($output,0,strlen($output)-1).'</span>';
}
echo $output.'<br>';
}
?>
Output:
<span>java.lang.String</span>
<span>dummy.class.Here</span>
<span>another.unknown.Class</span>

This should do it:
$new_content = preg_replace('#^\[L(.*);\s*$#m', '<span>$1[]</span>', $content);
Demo here: http://sandbox.onlinephpfunctions.com/code/8f0de08b5ba0882db2d98d99cdd961b9aebab074

You can use this:
$result = preg_replace('~\[L([^;]+);~', '<span>$1[]</span>', $txt);
where [^;]+ matches all that is not a ";"

PHP:preg_replace function

$text = "
<tag>
<html>
HTML
</html>
</tag>
";
I want to replace all the text present inside the tags with htmlspecialchars(). I tried this:
$regex = '/<tag>(.*?)<\/tag>/s';
$code = preg_replace($regex,htmlspecialchars($regex),$text);
But it doesn't work.
I am getting the output as htmlspecialchars of the regex pattern. I want to replace it with htmlspecialchars of the data matching with the regex pattern.
what should i do?

You're replacing the match with the pattern itself, you're not using the back-references and the e-flag, but in this case, preg_replace_callback would be the way to go:
$code = preg_replace_callback($regex,'htmlspecialchars',$text);
This will pass the mathces groups to htmlspecialchars, and use its return value as replacement. The groups might be an array, in which case, you can try either:
function replaceCallback($matches)
{
if (is_array($matches))
{
$matches = implode ('', array_slice($matches, 1));//first element is full string
}
return htmlspecialchars($matches);
}
Or, if your PHP version permits it:
preg_replace_callback($expr, function($matches)
{
$return = '';
for ($i=1, $j = count($matches); $i<$j;$i++)
{//loop like this, skips first index, and allows for any number of groups
$return .= htmlspecialchars($matches[$i]);
}
return $return;
}, $text);
Try any of the above, until you find simething that works... incidentally, if all you want to remove is <tag> and </tag>, why not go for the much faster:
echo htmlspecialchars(str_replace(array('<tag>','</tag>'), '', $text));
That's just keeping it simple, and it'll almost certainly be faster, too.
See the quickest, easiest way in action here

If you want to isolate the actual contents as defined by your pattern, you could use preg_match($regex,$text,$hits);. This will give you an array of hits those bits that were between the paratheses in the pattern, starting at $hits[1], $hits[0] contains the whole matched string). You can then start manipulating these found matches, possibly using htmlspecialchars ... and combine them again into $code.

regex question redux regarding definition list

Trying to figure out a way to throw out attributes in this data that do not have any values. Thanks for helping.
My current regex code , thanks to Tomalak looks like this
Regex find
([^=|]+)=([^|]+)(?:\||$)
Regex replace
<dt>$1</dt><dd>$2</dd>
Data looks like this
Bristle Material=|Wire Material=Steel|Dia.=4 in|Grit=|Bristle Diam=|Wire Size=0.0095 in|Arbor Diam=|Arbor Thread - TPI or Pitch=1/2 - 3/8 in|No. of Knots=|Face Width=1/2 in|Face Plate Thickness=7/16 in|Trim Length=7/8 in|Stem Diam=|Speed=6000 rpm [Max]|No. of Rows=|Color=|Hub Material=|Structure=|Tool Shape=|Applications=Cleaning rust, scale and dirt, Light Deburring, Edge Blending, Roughening for adhesion, Finish preparation prior to plating or painting|Applicable Materials=|Type=|Used With=Straight Grinders, Bench/Pedestal Grinders, Right Angle Grinders|Packing Type=|Quantity=1 per pack|Wt.=
End result should like this
<dt>Wire Material</dt><dd>Steel</dd><dt>Dia.</dt><dd>4 in</dd><dt>Wire Size</dt><dd>0.0095 in</dd>
Not this
Bristle Material=|<dt>Wire Material</dt><dd>Steel</dd><dt>Dia.</dt><dd>4 in</dd>Grit=|Bristle Diam=|<dt>Wire Size</dt><dd>0.0095 in

Here is how you can do it in PHP without using regular expressions:
$parts_list = explode('|', "Bristle Material=|Wire M....");
$parts = "";
foreach( $parts_list as $part ){
$p = explode('=', $part);
if(!empty($p[1])) $parts .= "<dt>$p[0]</dt>\n<dd>$p[1]</dd>\n";
}
echo $parts;
And here is how you can do it with regular expressions:
$parts = preg_replace(
array('/([^=|]*)=(?:\||$)/','/([^=|]*)=([^|]+)(?:\||$)/'),
array('', '<dt>$1</dt><dd>$2</dd>'),
$inputString
);
echo $parts;
Update
This is using a special replace feature of the PHP preg_replace which takes an array of regex expressions, and an array of replacement strings. The array() syntax of the function basically equates to this:
If I can match this: /([^=|]*)=(?:\||$)/ then replace it with an empty string.
If I can match this: /([^=|]*)=([^|]+)(?:\||$)/ then replace it with <dt>$1</dt><dd>$2</dd>
To test it in a Regex editor, you would run the first expression first (/([^=|]*)=(?:\||$)/) then run the second expression on the result of the first expression.

([^=|]*)=([^|]*)(?:\||$)
to skip the ones with out a value, try this:
(?:[^=|]*=|([^=|]*)=([^|]+))(?:\||$)

looks like you want preg_match here rather than preg_replace
preg_match_all('~([^|]+)=([^|\s][^|]*)~', $str, $matches, PREG_SET_ORDER);
foreach($matches as $match)
echo "<dt>{$match[1]}</dt><dd>{$match[2]}</dd>\n";

Regular Expressions: how to do "option split" replaces

those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!

I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.

This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not

Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.

Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);

Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];

This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);

$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}

PHP Split a string with start and stop value

I have fooled around with regex but can't seem to get it to work. I have a file called includes/header.php I am converting the file into one big string so that I can pull out a certain portion of the code to paste in the html of my document.
$str = file_get_contents('includes/header.php');
From here I am trying to get return only the string that starts with <ul class="home"> and ends with </ul>
try as I may to figure out an expression I am still confused.
Once I trim down the string I can just print that on my page but I can't figure out the trimming part

If you need something really hardcore, http://www.php.net/manual/en/book.xmlreader.php.
If you just want to rip out the text that fits that pattern try something like this.
$string = "stuff<ul class=\"home\">alsdkjflaskdvlsakmdf<another></another></ul>stuff";
if( preg_match( '/<ul class="home">(.*)<\/ul>/', $string, $match ) ) {
//do stuff with $match[0]
}

I'm assuming that the difficulty you're having has to do with escaping the regex special characters in the string(s) you're using as a delimiter. If so, try using the preg_quote() function:
$start = preg_quote('<ul class="home">');
$end = preg_quote('</ul>', '/');
preg_match("/" . $start. '.*' . $end . "/", $str, $matching_html_snippets);
The html you want should be in $matching_html_snippets[0]

You probably want an XML parser such as the built in one. Here is an example you might want to take a look at.
http://www.php.net/manual/en/function.xml-parse.php#90733
If you want to use regex then something along the lines of
$str = file_get_contents('includes/header.php');
$matchedstr = preg_match("<place your pattern here>", $str, $matches);
You probably want the pattern
'/<ul class="home">.*?<\/ul>/s'
Where $matches will contain an array of the matches it found so you can grab whatever element you want from the array with
$matchedstr[0];
which will return the first element. And then output that.
But I'd be a bit wary, regular expressions do tend to match to surprising edge cases and you need to feed them actual data to get reliable results as to when they are failing. However if you are just passing templates it should be ok, just do some tests and see if it all works. If not I'd still recommend using the PHP XML Parser.
Hope that helps.

If you feel like not using regexes you could use string finding, which I think the PHP manual implies is quicker:
function substrstr($orig, $startText, $endText) {
//get first occurrence of the start string
$start = strpos($orig, $startText);
//get last occurrence of the end string
$end = strrpos($orig, $endText);
if($start === FALSE || $end === FALSE)
return $orig;
$start++;
$length = $end - $start;
return substr($orig, $start, $length);
}
$substr = substrstr($string, '<ul class="home">', '</ul>');
You'll need to make some adjustments if you want to include the terminating strings in the output, but that should get you started!

Here's a novel way to do it; I make no guarantees about this technique's robustness or performance, other than it does work for the example given:
$prefix = '<ul class="home">';
$suffix = '</ul>';
$result = $prefix . array_shift(explode($suffix, array_pop(explode($prefix, $str)))) . $suffix;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match_all syntax problem - php

Having trouble with preg_match syntax with in a page I need to find anything like $first = '/>http:\/\/www.(.?)\/(.?)\</'; $second = '/="http:\/\/www.(.?)\/(.?)"/'; How could I combine the two? Something like $regex = '/(?="|>)http:\/\/www.(.?)/(.?)(?"|\<)/'; Sorry not very good at this.

Related

Regex use replaced string replacement

PHP:preg_replace function

regex question redux regarding definition list

Regular Expressions: how to do "option split" replaces

PHP Split a string with start and stop value

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

preg_match_all syntax problem - php

Having trouble with preg_match syntax with in a page I need to find anything like $first = '/>http:\/\/www.(.*?)\/(.*?)\</'; $second = '/="http:\/\/www.(.*?)\/(.*?)"/'; How could I combine the two? Something like $regex = '/(?="|>)http:\/\/www.(.*?)/(.*?)(?"|\<)/'; Sorry not very good at this.

Related

Regex use replaced string replacement

PHP:preg_replace function

regex question redux regarding definition list

Regular Expressions: how to do "option split" replaces

PHP Split a string with start and stop value

Categories

Resources

Having trouble with preg_match syntax with in a page I need to find anything like $first = '/>http:\/\/www.(.?)\/(.?)\</'; $second = '/="http:\/\/www.(.?)\/(.?)"/'; How could I combine the two? Something like $regex = '/(?="|>)http:\/\/www.(.?)/(.?)(?"|\<)/'; Sorry not very good at this.