Matching pricing from html - regex [duplicate]

Matching pricing from html - regex [duplicate] - php

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Matching Product Prices from an HTML text
I have a string which is usually, but not always, html page source
I want to extract pricing from within the string. I know this is not an exact science and the combination of currency symbol placement etc is endless but anything better than nothing.
example string:
$string = 'the price is <tag>£10.00</tag>';
So, I am starting with the following regex:
$price = preg_match('#(?:\$|\£|\€|\£|\&\#163;)(\d+(?:\.\d+)?)#', $string);
But of course this only returns the first character.
My question is, is there a way keep going through $string until it finds a certain character? e.g. < or a space? and then return what was found which in this case would be: 10.00
Is this a feasible way of doing this or is there a better way?
Here's the above in an example:
http://ideone.com/u8erb

Read the docs for preg_match, it does not return your match, it only returns if there was a match.
Try this
$string = 'the price is <tag>£10.00</tag>';
$price = preg_match_all('#(?:\$|\£|\€|\£|\&\#163;)(\d+(?:\.\d+)?)#', $string, $matches);
//This will contain your matches
var_dump($matches);

How about using preg_match_all with (\d+(?:\.\d+)?)(?=<\s*/\s*tag\s*>), since the currency may change? Any solution with regex will depend on a set of assumptions, so it's good to get those down first:
Where should you be looking, are these prices occurring within a given div?
What is the full set of possible values?
Try to make your regex as broad as possible, since a common reason it'll fail in the future is because something minor has changed which you haven't considered. If these prices are occurring in a tag with ids and classes, consider using an XHTML parser instead:
http://php.net/manual/en/book.dom.php
http://simplehtmldom.sourceforge.net/

Related

How to match only the first occurrence of String B after finding String A [duplicate]

This question already has answers here:
What is the meaning of the 'g' flag in regular expressions?
(10 answers)
Closed 3 years ago.
I have to do a mass search-and-replace with Regex on a bunch of Wordpress pages.
I have a content block where I need to replace two identical strings with different values. Therefore, I need to use Regex to find and match only the first occurrence of a string.
Everything I've done so far matches both occurrences. Here's what I have... trying to only match the first occurrence of a string after a certain string. However, the following regex matches both occurrences.
(item="Finish")*(8678)
I would like to see only the first occurrence of that string be matched.
https://regexr.com/4di9c
Here's the source string:
[vc_row class="options-tabs-section"][vc_column][vc_tta_tabs active_section="1"][vc_tta_section title="Finish" tab_id="finisha2ec-a4f1"][vc_media_grid element_width="3" item="8678" grid_id="vc_gid:1557243701236-783a77b3-2fb2-8" include="717,716,715,714,713,712,711,709,708,707"][/vc_tta_section][vc_tta_section title="Glass" tab_id="glassa2ec-a4f1"][vc_media_grid element_width="3" item="8656" grid_id="vc_gid:1557243701239-312d0b1e-25cd-9" include="964,972,724,971,969,968,967,966,965"][/vc_tta_section][vc_tta_section title="Handles" tab_id="handlesa2ec-a4f1"][vc_media_grid element_width="3" item="8678" grid_id="vc_gid:1557243701240-a408e67a-7aab-7" include="8661,8667,8660,8664,8665,8663,8662,8668,8666,8669"][/vc_tta_section][/vc_tta_tabs][/vc_column][/vc_row]
EDIT: I realize I'm probably thinking too hard here. Here's what I did that got me the answer I needed... Probably didn't need to do regex. I simply removed the global flag and went step-by-step:
<?php
$input='[vc_row class="options-tabs-section"][vc_column][vc_tta_tabs active_section="1"][vc_tta_section title="Finish" tab_id="finish84e3-bb24"][vc_media_grid element_width="3" item="4566" grid_id="vc_gid:1551131488971-e1d811df-ddaf-8" include="717,716,715,714,713,712,711,709,708,707"][/vc_tta_section][vc_tta_section title="Glass" tab_id="glass84e3-bb24"][vc_media_grid element_width="3" item="174" grid_id="vc_gid:1551131488974-85d653d4-4558-1" include="964,972,724,971,969,968,967,966,965"][/vc_tta_section][vc_tta_section title="Handles" tab_id="handles84e3-bb24"][vc_media_grid element_width="3" item="174" grid_id="vc_gid:1551131488975-8b255d93-ab50-6" include="961,960,959,913,912,911,910,909,908,907"][/vc_tta_section][/vc_tta_tabs][/vc_column][/vc_row]';
$output=preg_replace('/961,960,959,913,912,911,910,909,908,907/', '8661,8667,8660,8664,8665,8663,8662,8668,8666,8669', $input );
$output=preg_replace('/4566/', '8678', $output);
$output=preg_replace('/174/', '8656', $output);
$output=preg_replace('/174/', '8678', $output);
echo $output;
echo "\n";

There needs to be more information to properly answer your question, but if you are looking to match the first occurance and were to use the preg_match() PHP function, you could do it like so:
preg_match('/(item="Finish")*(8678)/', $yourString, $matches);
var_dump($matches[0]);
PHPFiddle
Depending on what you are doing, this may not solve it, but most of the PRCE/Regular Expression Functions in PHP will use similar parameters and function in a similar way.

Find the first string in a string then get everything between quotes where it matches [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Hope the title made sense, I tried.
What I am trying to do is find the first occurrence of a particular string in a string then when I find that match get everything between the two double quotes where that match was made.
For instance:
Let say I am trying to find the first occurrence of ".mp3" in the following string
Then my main string looks like this
My string is actually HTML from $string = file_get_contents('http://www.example.com/something') FYI
$string = 'something: "http://www.example.com/someaudio.mp3?variable=1863872368293283289&and=someotherstuff" that: "http://www.example.com/someaudio.mp3?variable=jf89f8f897f987f&and=someotherstuff" this: "http://www.example.com/someaudio.mp3?variable=123&and=someotherstuff" beer: "http://www.example.com/someaudio.mp3?variable=876sf&and=someotherstuff"';
At this point, I would like to find the first .mp3, then I need the entire url located within the double quotes where the match is made
Output should be
http://www.example.com/someaudio.mp3?variable=1863872368293283289&and=someotherstuff
I already know how to use strpos to find a match in php, problem is that from there how do I get the entire url between the quotes? Is this even possible?

You're going to use preg_match with the optional $matches argument.
The regex in question will be something like
$r = '".*\.mp3.*"';
You'll note that I've glossed over all of the subtleties of what might be meant by "a url located within double quotes".
The use of the $matches argument may feel a little weird; it used to be a normal way for functions to work, and still is in languages like C++.
$m = [];
if(preg_match($r, $subject_string, $m)){
$the_thing_you_want = $m[0];
}

There are a few ways of doing this. Using strpos (and a couple of other string manipulation functions) is one. As you mention, using strpos alone, only gets you to your first ".mp3". So you need to do combine it with something else. Let's have a play:
$str = <<<EOF
something: "http://www.example.com/someaudio.mp3?variable=1863872368293283289&and=someotherstuff"
that: "http://www.example.com/someaudio.mp3?variable=jf89f8f897f987f&and=someotherstuff"
this: "http://www.example.com/someaudio.mp3?variable=123&and=someotherstuff"
beer: "http://www.example.com/someaudio.mp3?variable=876sf&and=someotherstuff"
EOF;
$first_mp3_location = strpos($str, ".mp3");
//Get the location of the start of the first ".mp3" string
$first_quote_location = $first_mp3_location - strpos(strrev(substr($str, 0, $first_mp3_location)), '"');
/*
* Working backwards, get the first location of a '"',
* then subtract the first location of the ".mp3" from that number
* to get the first location of a '"', the right way up.
*/
$first_qoute_after_mp3_location = strpos($str, '"', $first_mp3_location);
//Then finally get the location of the first '"' after the ".mp3" string
var_dump(substr($str, $first_quote_location, $first_qoute_after_mp3_location - $first_quote_location));
//Finally, do a substr to get the string you want.
This is a pretty retarded longwinded way of getting to what you need to get to, and you're probably better off using regex, but there is a way of doing it with just strpos and its buddies strrev and substr.

How do you replace AND update in PHP using preg_replace (or similar)? [duplicate]

This question already has answers here:
What does the $1$2$4 mean in this preg_replace?
(3 answers)
Closed 4 years ago.
I want to loop through an array converting specific key/value pairs that contain markup to HTML.
So an example value for $comment['comment_text'] would be:
This has *bolded* text
And should become:
This has <strong>bolded</strong> text
Here's what I've tried:
$pattern = "/\*\b.*?\b\*/i";
$newComment = preg_replace($pattern, "<strong>$&</strong>",
$comment['comment_text']);
And what I get:
This has $& text
I realize I'm mashing up Javascript with PHP, but reading about back references in PHP hasn't made things any clearer.
My strings may have multiple bolded (in markup) instances...
Any help appreciated.
UPDATE:
Apologies - I didn't realize that Stackoverflow was converting asterisks to italics. I converted the example to code.
Also, my confusion came down to the use of $0 vs. $1. Which I still don't fully understand. I thought the numbers referred to the matches in the string...so if you had 5 instances you could refer to them by $0 through $4.
If you use $0 you get:
This has <strong>*bolded*</strong> text
But if you use $1 you get the desired result.

Do this.
$pattern = "/\*\b(.*?)\b\*/";
$newComment = preg_replace($pattern, "<strong>$1</strong>", $comment['comment_text']);
Here $1 refers to the group 1 match. Here I'm supposing that you want to make text between ** bolded.

I need to extract number from given strings using php

if my number= 432987 below method can be used:
$string = '<table><tr><td>432987</td></tr></table>';
preg_match_all("(\\d{6})", $string, $match);
var_dump($match[0]);
therefore above code can be used to get a number of some specific length, if I don't know the length of number then what could be the solution?
Example of string from where number need to be extracted/matched is below:
Snippet 1:
<table><tr><td>432987</td></tr></table>
Snippet 2:
<div>164PE
09983 PO#432987</div>
Snippet 3:
Order 432987IRC
Snippet 4:
432987
Let me know if more clarification is required.
Above is edited part of the original question.

I originally wasn't going to answer this but reading Tom Lords link to the mystical Regex parsing of XML made me reconsider.
Regex CAN be used to parse all examples shown because the XHTML is "fluff" and is entirely unimportant for the finding of the number(s). Yes, some instances of XHTML will potentially contain 6 numeric characters in a row, but that's unlikely at best, and for the perceived scale of this application (ie not complex or massive, judging by the snippets given), it's doubtful that will be an issue.
The resultant output is not at all [X]HTML dependant in any form.
Quote:
Snippet 1:
<table><tr><td>432987</td></tr></table>
Snippet 2:
<div>164PE 09983
PO#432987</div>
Snippet 3:
Order 432987IRC
Snippet 4:
432987
To solve all of these and to return your missing number, 432987 you can simply do this:
$string = //whatever from above
preg_match_all("/[0-9]{6}/", $string, $match);
This will match any string of 6 digits without break.
Full Proof:
$string1 = "<table><tr><td>432987</td></tr></table>";
$string2 = "<div>164PE
09983 PO#432987</div>";
$string3 = "Order 432987IRC";
$string4 = "432987";
$string5 = "<html><head><title>Some numbers</title></head>
<body><h2>Oh my word, this is HTML being attacked by Regex!!!</h2>
<p>This must be Doooom! 123456</p>
</body>
</html>";
preg_match_all("/[0-9]{6}/", $string5, $match);
print_r($match);
Alternatively you can use regex number identifier \d and so:
preg_match_all("/\d{6}/", $string5, $match);
Does exactly the same thing.
I have made an assumption you want a 6 digit number, but I suspect if you know what the number is and that the number will be static then it's easier to use PHP string find and replace functions such as str_replace, etc.
Edit: Some Further reading.

$string = '<table><tr><td>432987</td></tr></table>';
$table = new SimpleXMLElement( $string );
echo $table->tr->td; //432987
You can't parse XML with regex, use SimpleXMLElement for this case will solve your problem. More infomation in this post.

Manipulate HTML paragraphs in php [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Highlight keywords in a paragraph
Here is another question for you. I have a small problem in php and I thought before find an extra-ordinary solution by myself there maybe an easier and faster way to solve the problem.
Assuming I have a string which contains HTML paragraph tags like:
$string="<p>Hello this is nick</p>
<p>i need some help over here</p>
<p></p><p>Does anyone know a solution</p>"
And an array of stings which contains some "clue" words:
$array=("Hello","nick", "help", "anyone", "solution")
I now would like to do the following:
Output the $string in a browser but the "clue" words should have a special format e.g. being bold or highlighted.
What makes me find this a bit difficult is that I want to keep the paragraphs as there are. In other words I want the final output to look exactly as the original (including new lines/new paragraphs) but with some words bold
I thought I could use strip_tags to remove <p> and </p> tags and then split the returned string by spaces. So as to get an array of words. Then I would output each word individually by checking if that word is contained in the $array. If yes, then it would be outputted with a bold style.
In this way I clearly lose the notion of new paragraphs and all the paragraphs will be merged in a single one.
Is there an easy way to fix that ? For example a way to have the knowledge that e.g. word "Hello" starts in a new paragraph? Or is there something else I can do?

Just replace the words with formatted versions of themselves. The regex below maintains the case and replaces full words only (so that for example in the word "snicker" the word "nick" inside it isn't replaced).
preg_replace( '/\b('.implode( '|', $array ).')\b/i', '<em>$1</em>', $string );

Why not just replace your clue words directly ?
$string = str_ireplace(array('hello', 'nick'), array('<strong>hello</strong>', '<strong>nick</strong>'), $string);
(of course the second array passed to the function would be generated beforehand)

use str_replace and replace the words with bold tags around them

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.