I want this string:
value="1,'goahead'" your='56' so='"<br />"'
I want php regex to return result array as following :
value="1,'goahead'"
your='56'
so='"<br />"'
I tried this regex :
preg_match_all("#([\d\w_]+)\s*=\s*(\"|')([^'\"]*)(\"|')*#isx")
but it failed to fetch this value: value="1,'goahead'"
I think that it's because of single quotation inside the value. Please help me with improved pattern.
I'd suggest looking at DOMDocument:
If your input is a complete tag...
<p value="1,'goahead'" your='56' so='"<br />"'>
...then you can do this:
$DOM = new DOMDocument;
$DOM->loadHTML($str);
foreach ($DOM->getElementsByTagName('p')->item(0)->attributes as $attr) {
$attributes[$attr->nodeName] = $attr->nodeValue;
}
This gives you the array you're looking for:
Array
(
[value] => 1,'goahead'
[your] => 56
[so] => "<br />"
)
Working example: http://3v4l.org/TIIZ2
You would be better off with this regex:
/(\w+)\s*=\s*(["'])(.*?)\2/
This will give the attribute name in the first subpattern, the type of quote used in the second, and the attribute value in the third.
Of particular importance are the .*?, which matches lazily (ie. the least possible) and the \2 which matches the second subpattern (in this case, the quote used). This does not allow for escaping with \" or \', though. That's be a bit more involved.
I'm afraid to ask how you'd end up to do this and why, anyway, this might help you:
if (preg_match('%(value="\d+,(\s+)?\'[a-z]+\'"(\s+)?)?(your=\'\d+\'(\s+)?)?(so=\'"<br(\s+)?\/>"\')?%six', $subject, $matches)) { }
Related
This is my regular expression:
$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(?<price>([0-9.]*)).*?)\$(.*?)(\n|\s)*?</";
This is the sample pattern from which I have to do a match:
<td><strong>.zx</strong></td><td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399</td><td>zxcddcdcdcdc</td></tr><tr class="dark"><td><strong>.aa.rr</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&eae;s $199</td><td>xxxx</td></tr><tr class="bar"><td colspan="3"></td></tr><tr class="bright"><td><strong>.vfd</strong></td><td><span class="offer"><strong>xscre:<br></strong>$99 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>duⅇs $199</td><td>xxxxxxxx</td></tr><tr class="dark"><td><strong>.qwe</strong></td><td><span class="offer"><strong>xxx<br></strong>$99 xxxc;o<span class="fineprint_number">2</span>
Here is what I am doing in PHP
$pattern_new="/<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?</";
$source = file_get_contents("https://www.abc.com/sources/data.txt");
preg_match_all($pattern_new, $source, $match_newprice, PREG_PATTERN_ORDER);
echo$source;
print_r($match_newprice);
the$match_newprice is returning an empty array.
When I am using a regex tester like myregextester or solmetra.com I am getting a perfect match no issues at all but when I am using php preg_match_all to do the match it is returning an empty array. I increased the pcre.backtrack_limit but its still the same issue.
I don't seem to understand the problem. Any help would be much appreciated.
I assume you were trying to do a noncapture group for <price... but you missed the :. Or you should take out the question mark. If the price group is optional, try like the regex below. You should use the following website to help you with regex. I find it extremely helpful.
<td>(\n|\s)*?(<span(\n|\s|.)*?<\/strong>(\n|\s)*?\$(<price>)*([0-9.]*).*?)\$(.*?)(\n|\s)*?<
Edit live on Debuggex
In the above example, your first match would have the following captures:
0: "<td><span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s $399<"
1: ""
2: "<span class="offer"><strong>xscre:<br></strong>$299 xxxxx&x;xx<span class="fineprint_number">2</span></span><br>de&ea;s "
3: ">"
4: ""
5: ""
6: "299"
7: "399"
8: ""
Is this what you are looking for?
Another problem which is PHP related with this:
<?php
echo "\$".PHP_EOL;
echo '\$'.PHP_EOL;
Result:
$
\$
... as in double quoted strings the $ is expected to signify the start of a variable, and needs escaping if you mean a bare $. Put single quotes around your regex & it will probably be fine (haven't looked at in detail though, you may want to use the /x option & add some formatting whitespace/comments if you need to debug this a half year from now).
The good way to do that:
$oProductsHTML = new DOMDocument();
#$oProductsHTML->loadHTML($sHtml);
$oSpanNodes = $oProductsHTML->getElementsByTagName('span');
foreach ($oSpanNodes as $oSpanNode) {
if (preg_match('~\boffer\b~', $oSpanNode->getAttribute('class')) &&
preg_match('~\$\K\d++~', $oSpanNode->nodeValue, $aMatch) )
{
$sPrice = $aMatch[0];
echo '<br/>' . $sPrice;
}
}
$sHtml stands for your string.
And i'm sure you can make it shorter with XPath.
The bad way:
$sPattern = '~<span class="offer\b(?>[^>]++|>(?!\$))+>\$\K\d++~';
preg_match_all($sPattern, $sHtml, $aMatches);
print_r ($aMatches[0]);
Notice: \d++ can be replaced by \d++(?>\.\d++)? to allow decimal numbers.
Trying to replace a string, but it seems to only match the first occurrence, and if I have another occurrence it doesn't match anything, so I think I need to add some sort of end delimiter?
My code:
$mappings = array(
'fname' => $prospect->forename,
'lname' => $prospect->surname,
'cname' => $prospect->company,
);
foreach($mappings as $key => $mapping) if(empty($mapping)) $mappings[$key] = '$2';
$match = '~{(.*)}(.*?){/.*}$~ise';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
// $source = 'Hello {fname}Default{/fname}';
$text = preg_replace($match, '$mappings["$1"]', $source);
So if I use the $source that's commented, it matches fine, but if I use the one currently in the code above where there's 2 matches, it doesn't match anything and I get an error:
Message: Undefined index: fname}Default{/fname} {lname
Filename: schedule.php(62) : regexp code
So am I right in saying I need to provide an end delimiter or something?
Thanks,
Christian
Apparently your regexp matches fname}Default{/fname} {lname instead of Default.
As I mentioned here use {(.*?)} instead of {(.*)}.
{ has special meaning in regexps so you should escape it \\{.
And I recommend using preg_replace_callback instead of e modifier (you have more flow control and syntax higlighting and it's impossible to force your program to execute malicious code).
Last mistake you're making is not checking whether the requested index exists. :)
My solution would be:
<?php
class A { // Of course with better class name :)
public $mappings = array(
'fname' => 'Tested'
);
public function callback( $match)
{
if( isset( $this->mappings[$match[1]])){
return $this->mappings[$match[1]];
}
return $match[2];
}
}
$a = new A();
$match = '~\\{([^}]+)\\}(.*?)\\{/\\1\\}~is';
$source = 'Hello {fname}Default{/fname} {lname}Last{/lname}';
echo preg_replace_callback( $match, array($a, 'callback'), $source);
This results into:
[vyktor#grepfruit tmp]$ php stack.php
Hello Tested Last
Your regular expression is anchored to the end of the string so you closing {/whatever} must be the last thing in your string. Also, since your open and closing tags are simply .*, there's nothing in there to make sure they match up. What you want is to make sure that your closing tag matches your opening one - using a backreference like {(.+)}(.*?){/\1} will make sure they're the same.
I'm sure there's other gotchas in there - if you have control over the format of strings you're working with (IE - you're rolling your own templating language), I'd seriously consider moving to a simpler, easier to match format. Since you're not 'saving' the default values, having enclosing tags provides you with no added value but makes the parsing more complicated. Just using $VARNAME would work just as well and be easier to match (\$[A-Z]+), without involving back-references or having to explicitly state you're using non-greedy matching.
I have a question about a regular function that is giving me grief. I have a list of items that is separated in tags. I am trying to extract everything between two particular tags (which occur multiple times). Here is a sample of the list I am parsing:
<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>
I'm trying to get everything in between "ResumeResultItem_V3" as a blob of text, but I can't seem to get the expression right.
Here is the code I have so far:
$test = "(<ResumeResultItem_V3>)";
$test2 = "(<\/ResumeResultItem_V3>)";
preg_match_all("/" . $test . "(\w+)" . $test2 . "/", $xml, $matches);
foreach ($matches[0] as $match) {
echo $match;
echo "<br /><br />";
}
How can I fix this?
I'm making assuptions about your XML structure, but I really think you need an example using an XML parser, like SimpleXML.
$xml = new SimpleXMLElement( $file );
foreach( $xml->ResumeResultItem_V3 as $ResumeResultItem_V3 )
echo (string)$ResumeResultItem_V3;
You are probably better off with simplexml for extracting the data here.
But to also answer the regex question. \w+ only matches word-characters. But in this case you want it to match pretty much everything in between the delimeters, which .*? can be used for.
preg_match_all("/$test(.*?)$test2/s", $xml, $matches);
Only works with the /s modifier though.
Ignoring that you probably ought to use an XML parser, and that PHP has one you can use...
The issue is that \w+ matches word characters, not any character. A space and most punctuation aren't word characters, so your match fails. You need instead to match "any" character . for as many as there are +, but because you might be able to group excessively, you need a modifier to make it non-greedy, ?. Your expression should work if you change \w+ to .+? -- the any character match also requires an s modifier, so:
preg_match_all('/' . $test . '(.+?)' . $test2 . '/s', $xml, $matches);
If you can use the output as an array with 1 item for each of the "text blob" matches, try this:
<?php
$text =
"<ResumeResultItem_V3>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
</ResumeResultItem_V3>
<ResumeResultItem_V3>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
</ResumeResultItem_V3>";
$matches = preg_split("/<\/ResumeResultItem_V3>/",preg_replace("/<ResumeResultItem_V3>/","",$text));
print_r($matches);
?>
Results in:
Array
(
[0] =>
<ResumeTitle>Johnson</ResumeTitle>
<RecentEmployer>University of Phoenix</RecentEmployer>
<RecentJobTitle>Advisor</RecentJobTitle>
<RecentPay>40000</RecentPay>
[1] =>
<ResumeTitle>ResumeforJake</ResumeTitle>
<RecentEmployer>APEX</RecentEmployer>
<RecentJobTitle>Consultant</RecentJobTitle>
<RecentPay>66000</RecentPay>
[2] =>
)
those reqular expressions drive me crazy. I'm stuck with this one:
test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not
Task:
Remove all [[ and ]] and if there is an option split choose the later one so output should be:
test1:link test2:silver test3:out1insideout2 test4:this|not
I came up with (PHP)
$text = preg_replace("/\\[\\[|\\]\\]/",'',$text); // remove [[ or ]]
this works for part1 of the task. but before that I think I should do the option split, my best solution:
$text = preg_replace("/\\[\\[(.*\|)(.*?)\\]\\]/",'$2',$text);
Result:
test1:silver test3:[[out1[[inside]]out2]] this|not
I'm stuck. may someone with some free minutes help me? Thanks!
I think the easiest way to do this would be multiple passes. Use a regular expression like:
\[\[(?:[^\[\]]*\|)?([^\[\]]+)\]\]
This will replace option strings to give you the last option from the group. If you run it repeatedly until it no longer matches, you should get the right result (the first pass will replace [[out1[[inside]]out2]] with [[out1insideout2]] and the second will ditch the brackets.
Edit 1: By way of explanation,
\[\[ # Opening [[
(?: # A non-matching group (we don't want this bit)
[^\[\]] # Non-bracket characters
* # Zero or more of anything but [
\| # A literal '|' character representing the end of the discarded options
)? # This group is optional: if there is only one option, it won't be present
( # The group we're actually interested in ($1)
[^\[\]] # All the non-bracket characters
+ # Must be at least one
) # End of $1
\]\] # End of the grouping.
Edit 2: Changed expression to ignore ']' as well as '[' (it works a bit better like that).
Edit 3: There is no need to know the number of nested brackets as you can do something like:
$oldtext = "";
$newtext = $text;
while ($newtext != $oldtext)
{
$oldtext = $newtext;
$newtext = preg_replace(regexp,replace,$oldtext);
}
$text = $newtext;
Basically, this keeps running the regular expression replace until the output is the same as the input.
Note that I don't know PHP, so there are probably syntax errors in the above.
This is impossible to do in one regular expression since you want to keep content in multiple "hierarchies" of the content. It would be possible otherwise, using a recursive regular expression.
Anyways, here's the simplest, most greedy regular expression I can think of. It should only replace if the content matches your exact requirements.
You will need to escape all backslashes when putting it into a string (\ becomes \\.)
\[\[((?:[^][|]+|(?!\[\[|]])[^|])++\|?)*]]
As others have already explained, you use this with multiple passes. Keep looping while there are matches, performing replacement (only keeping match group 1.)
Difference from other regular expressions here is that it will allow you to have single brackets in the content, without breaking:
test1:[[link]] test2:[[gold|si[lv]er]]
test3:[[out1[[in[si]de]]out2]] test4:this|not
becomes
test1:[[link]] test2:si[lv]er
test3:out1in[si]deout2 test4:this|not
Why try to do it all in one go. Remove the [[]] first and then deal with options, do it in two lines of code.
When trying to get something going favour clarity and simplicity.
Seems like you have all the pieces.
Why not just simply remove any brackets that are left?
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$str = preg_replace('/\\[\\[(?:[^|\\]]+\\|)+([^\\]]+)\\]\\]/', '$1', $str);
$str = str_replace(array('[', ']'), '', $str);
Well, I didn't stick to just regex, because I'm of a mind that trying to do stuff like this with one big regex leads you to the old joke about "Now you have two problems". However, give something like this a shot:
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not'; $reg = '/(.*?):(.*?)( |$)/';
preg_match_all($reg, $str, $m);
foreach($m[2] as $pos => $match) {
if (strpos($match, '|') !== FALSE && strpos($match, '[[') !== FALSE ) {
$opt = explode('|', $match); $match = $opt[count($opt)-1];
}
$m[2][$pos] = str_replace(array('[', ']'),'', $match );
}
foreach($m[1] as $k=>$v) $result[$k] = $v.':'.$m[2][$k];
This is C# using only using non-escaped strings, hence you will have to double the backslashes in other languages.
String input = "test1:[[link]] " +
"test2:[[gold|silver]] " +
"test3:[[out1[[inside]]out2]] " +
"test4:this|not";
String step1 = Regex.Replace(input, #"\[\[([^|]+)\|([^\]]+)\]\]", #"[[$2]]");
String step2 = Regex.Replace(step1, #"\[\[|\]\]", String.Empty);
// Prints "test1:silver test3:out1insideout2 test4:this|not"
Console.WriteLine(step2);
$str = 'test1:[[link]] test2:[[gold|silver]] test3:[[out1[[inside]]out2]] test4:this|not';
$s = preg_split("/\s+/",$str);
foreach ($s as $k=>$v){
$v = preg_replace("/\[\[|\]\]/","",$v);
$j = explode(":",$v);
$j[1]=preg_replace("/.*\|/","",$j[1]);
print implode(":",$j)."\n";
}
So I'm working on a project that will allow users to enter poker hand histories from sites like PokerStars and then display the hand to them.
It seems that regex would be a great tool for this, however I rank my regex knowledge at "slim to none".
So I'm using PHP and looping through this block of text line by line and on lines like this:
Seat 1: fabulous29 (835 in chips)
Seat 2: Nioreh_21 (6465 in chips)
Seat 3: Big Loads (3465 in chips)
Seat 4: Sauchie (2060 in chips)
I want to extract seat number, name, & chip count so the format is
Seat [number]: [letters&numbers&characters] ([number] in chips)
I have NO IDEA where to start or what commands I should even be using to optimize this.
Any advice is greatly appreciated - even if it is just a link to a tutorial on PHP regex or the name of the command(s) I should be using.
I'm not entirely sure what exactly to use for that without trying it, but a great tool I use all the time to validate my RegEx is RegExr which gives a great flash interface for trying out your regex, including real time matching and a library of predefined snippets to use. Definitely a great time saver :)
Something like this might do the trick:
/Seat (\d+): ([^\(]+) \((\d+)in chips\)/
And some basic explanation on how Regex works:
\d = digit.
\<character> = escapes character, if not part of any character class or subexpression. for example:
\t
would render a tab, while \\t would render "\t" (since the backslash is escaped).
+ = one or more of the preceding element.
* = zero or more of the preceding element.
[ ] = bracket expression. Matches any of the characters within the bracket. Also works with ranges (ex. A-Z).
[^ ] = Matches any character that is NOT within the bracket.
( ) = Marked subexpression. The data matched within this can be recalled later.
Anyway, I chose to use
([^\(]+)
since the example provides a name containing spaces (Seat 3 in the example). what this does is that it matches any character up to the point that it encounters an opening paranthesis.
This will leave you with a blank space at the end of the subexpression (using the data provided in the example). However, his can easily be stripped away using the trim() command in PHP.
If you do not want to match spaces, only alphanumerical characters, you could so something like this:
([A-Za-z0-9-_]+)
Which would match any letter (within A-Z, both upper- & lower-case), number as well as hyphens and underscores.
Or the same variant, with spaces:
([A-Za-z0-9-_\s]+)
Where "\s" is evaluated into a space.
Hope this helps :)
Look at the PCRE section in the PHP Manual. Also, http://www.regular-expressions.info/ is a great site for learning regex. Disclaimer: Regex is very addictive once you learn it.
I always use the preg_ set of function for REGEX in PHP because the PERL-compatible expressions have much more capability. That extra capability doesn't necessarily come into play here, but they are also supposed to be faster, so why not use them anyway, right?
For an expression, try this:
/Seat (\d+): ([^ ]+) \((\d+)/
You can use preg_match() on each line, storing the results in an array. You can then get at those results and manipulate them as you like.
EDIT:
Btw, you could also run preg_match_all on the entire block of text (instead of looping through line-by-line) and get the results that way, too.
Check out preg_match.
Probably looking for something like...
<?php
$str = 'Seat 1: fabulous29 (835 in chips)';
preg_match('/Seat (?<seatNo>\d+): (?<name>\w+) \((?<chipCnt>\d+) in chips\)/', $str, $matches);
print_r($matches);
?>
*It's been a while since I did php, so this could be a little or a lot off.*
May be it is very late answer, But I am interested in answering
Seat\s(\d):\s([\w\s]+)\s\((\d+).*\)
http://regex101.com/r/cU7yD7/1
Here's what I'm currently using:
preg_match("/(Seat \d+: [A-Za-z0-9 _-]+) \((\d+) in chips\)/",$line)
To process the whole input string at once, use preg_match_all()
preg_match_all('/Seat (\d+): \w+ \((\d+) in chips\)/', $preg_match_all, $matches);
For your input string, var_dump of $matches will look like this:
array
0 =>
array
0 => string 'Seat 1: fabulous29 (835 in chips)' (length=33)
1 => string 'Seat 2: Nioreh_21 (6465 in chips)' (length=33)
2 => string 'Seat 4: Sauchie (2060 in chips)' (length=31)
1 =>
array
0 => string '1' (length=1)
1 => string '2' (length=1)
2 => string '4' (length=1)
2 =>
array
0 => string '835' (length=3)
1 => string '6465' (length=4)
2 => string '2060' (length=4)
On learning regex: Get Mastering Regular Expressions, 3rd Edition. Nothing else comes close to the this book if you really want to learn regex. Despite being the definitive guide to regex, the book is very beginner friendly.
Try this code. It works for me
Let say that you have below lines of strings
$string1 = "Seat 1: fabulous29 (835 in chips)";
$string2 = "Seat 2: Nioreh_21 (6465 in chips)";
$string3 = "Seat 3: Big Loads (3465 in chips)";
$string4 = "Seat 4: Sauchie (2060 in chips)";
Add to array
$lines = array($string1,$string2,$string3,$string4);
foreach($lines as $line )
{
$seatArray = explode(":", $line);
$seat = explode(" ",$seatArray[0]);
$seatNumber = $seat[1];
$usernameArray = explode("(",$seatArray[1]);
$username = trim($usernameArray[0]);
$chipArray = explode(" ",$usernameArray[1]);
$chipNumber = $chipArray[0];
echo "<br>"."Seat [".$seatNumber."]: [". $username."] ([".$chipNumber."] in chips)";
}
you'll have to split the file by linebreaks,
then loop thru each line and apply the following logic
$seat = 0;
$name = 1;
$chips = 2;
foreach( $string in $file ) {
if (preg_match("Seat ([1-0]): ([A-Za-z_0-9]*) \(([1-0]*) in chips\)", $string, $matches)) {
echo "Seat: " . $matches[$seat] . "<br>";
echo "Name: " . $matches[$name] . "<br>";
echo "Chips: " . $matches[$chips] . "<br>";
}
}
I haven't ran this code, so you may have to fix some errors...
Seat [number]: [letters&numbers&characters] ([number] in chips)
Your Regex should look something like this
Seat (\d+): ([a-zA-Z0-9]+) \((\d+) in chips\)
The brackets will let you capture the seat number, name and number of chips in groups.