I need to extract number from given strings using php - php

if my number= 432987 below method can be used:
$string = '<table><tr><td>432987</td></tr></table>';
preg_match_all("(\\d{6})", $string, $match);
var_dump($match[0]);
therefore above code can be used to get a number of some specific length, if I don't know the length of number then what could be the solution?
Example of string from where number need to be extracted/matched is below:
Snippet 1:
<table><tr><td>432987</td></tr></table>
Snippet 2:
<div>164PE
09983 PO#432987</div>
Snippet 3:
Order 432987IRC
Snippet 4:
432987
Let me know if more clarification is required.
Above is edited part of the original question.

I originally wasn't going to answer this but reading Tom Lords link to the mystical Regex parsing of XML made me reconsider.
Regex CAN be used to parse all examples shown because the XHTML is "fluff" and is entirely unimportant for the finding of the number(s). Yes, some instances of XHTML will potentially contain 6 numeric characters in a row, but that's unlikely at best, and for the perceived scale of this application (ie not complex or massive, judging by the snippets given), it's doubtful that will be an issue.
The resultant output is not at all [X]HTML dependant in any form.
Quote:
Snippet 1:
<table><tr><td>432987</td></tr></table>
Snippet 2:
<div>164PE 09983
PO#432987</div>
Snippet 3:
Order 432987IRC
Snippet 4:
432987
To solve all of these and to return your missing number, 432987 you can simply do this:
$string = //whatever from above
preg_match_all("/[0-9]{6}/", $string, $match);
This will match any string of 6 digits without break.
Full Proof:
$string1 = "<table><tr><td>432987</td></tr></table>";
$string2 = "<div>164PE
09983 PO#432987</div>";
$string3 = "Order 432987IRC";
$string4 = "432987";
$string5 = "<html><head><title>Some numbers</title></head>
<body><h2>Oh my word, this is HTML being attacked by Regex!!!</h2>
<p>This must be Doooom! 123456</p>
</body>
</html>";
preg_match_all("/[0-9]{6}/", $string5, $match);
print_r($match);
Alternatively you can use regex number identifier \d and so:
preg_match_all("/\d{6}/", $string5, $match);
Does exactly the same thing.
I have made an assumption you want a 6 digit number, but I suspect if you know what the number is and that the number will be static then it's easier to use PHP string find and replace functions such as str_replace, etc.
Edit: Some Further reading.

$string = '<table><tr><td>432987</td></tr></table>';
$table = new SimpleXMLElement( $string );
echo $table->tr->td; //432987
You can't parse XML with regex, use SimpleXMLElement for this case will solve your problem. More infomation in this post.

Related

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Replacing part of the string with regex

I have a string like this:
$string ='//upload.wikimedia.org/wikipedia/commons/thumb/6/6b/AkutanZero1.jpg/220px-AkutanZero1.jpg';
But I'm trying to replace a section of it with another:
$string ='//upload.wikimedia.org/wikipedia/commons/thumb/6/6b/AkutanZero1.jpg/123px-AkutanZero1.jpg';
I'm using trying to use preg_replace, and I know that the string will always end with /thumb/(a hex value)/(two hex values)/(stuff)/(one or more numbers)-px-(stuff)
Unfortunately I haven't been successful in getting the text replaced and don't know what I'm doing wrong.
It would be easy if I could assume /(one or more numbers)-px existing only once but it could also exist in the /(stuff) part too.
preg_replace('/\/thumb\/[0-9a-f]\/[0-9a-f]{2}\/.+\/([0-9]+)-px-.+$/i', '328', $string);
preg_replace('/(\/thumb\/[0-9a-f]\/[0-9a-f]{2}\/.+\/)([0-9]+)(-px-.+)$/i', $1.'328'.$3, $string);
Based on your single sample input, you don't need any capture groups to get the expected result. Just find the occurrence(s) of digits followed by px- and swap in your preferred value. If this isn't robust enough, please improve your question.
Code: (Demo)
$string='//upload.wikimedia.org/wikipedia/commons/thumb/6/6b/AkutanZero1.jpg/220px-AkutanZero1.jpg';
echo preg_replace('/\d+px-/','123px-',$string);
Output:
//upload.wikimedia.org/wikipedia/commons/thumb/6/6b/AkutanZero1.jpg/123px-AkutanZero1.jpg

How to get a number from a html source page?

I'm trying to retrieve the followed by count on my instagram page. I can't seem to get the Regex right and would very much appreciate some help.
Here's what I'm looking for:
y":{"count":
That's the beginning of the string, and I want the 4 numbers after that.
$string = preg_replace("{y"\"count":([0-9]+)\}","",$code);
Someone suggested this ^ but I can't get the formatting right...
You haven't posted your strings so it is a guess to what the regex should be... so I'll answer on why your codes fail.
preg_replace('"followed_by":{"count":\d')
This is very far from the correct preg_replace usage. You need to give it the replacement string and the string to search on. See http://php.net/manual/en/function.preg-replace.php
Your second usage:
$string = preg_replace(/^y":{"count[0-9]/","",$code);
Is closer but preg_replace is global so this is searching your whole file (or it would if not for the anchor) and will replace the found value with nothing. What your really want (I think) is to use preg_match.
$string = preg_match('/y":\{"count(\d{4})/"', $code, $match);
$counted = $match[1];
This presumes your regex was kind of correct already.
Per your update:
Demo: https://regex101.com/r/aR2iU2/1
$code = 'y":{"count:1234';
$string = preg_match('/y":\{"count:(\d{4})/', $code, $match);
$counted = $match[1];
echo $counted;
PHP Demo: https://eval.in/489436
I removed the ^ which requires the regex starts at the start of your string, escaped the { and made the\d be 4 characters long. The () is a capture group and stores whatever is found inside of it, in this case the 4 numbers.
Also if this isn't just for learning you should be prepared for this to stop working at some point as the service provider may change the format. The API is a safer route to go.
This regexp should capture value you're looking for in the first group:
\{"count":([0-9]+)\}
Use it with preg_match_all function to easily capture what you want into array (you're using preg_replace which isn't for retrieving data but for... well replacing it).
Your regexp isn't working because you didn't escaped curly brackets. And also you didn't put count quantifier (plus sign in my example) so it would only capture first digit anyway.

PHP preg_replace, split or match?

I need to parse a string and replace a specific format for tv show names that don't fit my normal format of my media player's queue.
Some examples
Show.Name.2x01.HDTV.x264 should be Show.Name.S02E01.HDTV.x264
Show.Name.10x05.HDTV.XviD should be Show.Name.S10E05.HDTV.XviD
After the show name, there may be 1 or 2 digits before the x, I want the output to always be an S with two digits so add a leading zero if needed. After the x it should always be an E with two digits.
I looked through the manual pages for the preg_replace, split and match functions but couldn't quite figure out what I should do here. I can match the part of the string I want with /\dx\d{2}/ so I was thinking first check if the string has that pattern, then try and figure out how to split the parts out of the match but I didn't get anywhere.
I work best with examples, so if you can point me in the right direction with one that would be great. My only test area right now is a PHP 4 install, so please no PHP 5 specific directions, once I understand whats happening I can probably update it later for PHP 5 if needed :)
A different approach as a solution using #sprintf using PHP4 and below.
$text = preg_replace('/([0-9]{1,2})x([0-9]{2})/ie',
'sprintf("S%02dE%02d", $1, $2)', $text);
Note: The use of the e modifier is depreciated as of PHP5.5, so use preg_replace_callback()
$text = preg_replace_callback('/([0-9]{1,2})x([0-9]{2})/',
function($m) {
return sprintf("S%02dE%02d", $m[1], $m[2]);
}, $text);
Output
Show.Name.S02E01.HDTV.x264
Show.Name.S10E05.HDTV.XviD
See working demo
preg_replace is the function you are looking function.
You have to write a regex pattern that picks correct place.
<?php
$replaced_data = preg_replace("~([0-9]{2})x([0-9]{2})~s", "S$1E$2", $data);
$replaced_data = preg_replace("~S([1-9]{1})E~s", "S0$1E", $replaced_data);
?>
Sorry I could not test it but it should work.
An other way using the preg_replace_callback() function:
$subject = <<<'LOD'
Show.Name.2x01.HDTV.x264 should be Show.Name.S02E01.HDTV.x264
Show.Name.10x05.HDTV.XviD should be Show.Name.S10E05.HDTV.XviD
LOD;
$pattern = '~([0-9]++)x([0-9]++)~i';
$callback = function ($match) {
return sprintf("S%02sE%02s", $match[1], $match[2]);
};
$result = preg_replace_callback($pattern, $callback, $subject);
print_r($result);

Matching pricing from html - regex [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Matching Product Prices from an HTML text
I have a string which is usually, but not always, html page source
I want to extract pricing from within the string. I know this is not an exact science and the combination of currency symbol placement etc is endless but anything better than nothing.
example string:
$string = 'the price is <tag>£10.00</tag>';
So, I am starting with the following regex:
$price = preg_match('#(?:\$|\£|\€|\£|\&\#163;)(\d+(?:\.\d+)?)#', $string);
But of course this only returns the first character.
My question is, is there a way keep going through $string until it finds a certain character? e.g. < or a space? and then return what was found which in this case would be: 10.00
Is this a feasible way of doing this or is there a better way?
Here's the above in an example:
http://ideone.com/u8erb
Read the docs for preg_match, it does not return your match, it only returns if there was a match.
Try this
$string = 'the price is <tag>£10.00</tag>';
$price = preg_match_all('#(?:\$|\£|\€|\£|\&\#163;)(\d+(?:\.\d+)?)#', $string, $matches);
//This will contain your matches
var_dump($matches);
How about using preg_match_all with (\d+(?:\.\d+)?)(?=<\s*/\s*tag\s*>), since the currency may change? Any solution with regex will depend on a set of assumptions, so it's good to get those down first:
Where should you be looking, are these prices occurring within a given div?
What is the full set of possible values?
Try to make your regex as broad as possible, since a common reason it'll fail in the future is because something minor has changed which you haven't considered. If these prices are occurring in a tag with ids and classes, consider using an XHTML parser instead:
http://php.net/manual/en/book.dom.php
http://simplehtmldom.sourceforge.net/

Categories