how i can print some array in echo string - php

i extract som html information of a website using file_get_contents and i want ti print by echo
how i can do
$page = file_get_contents('*******');
preg_match("/<span class=\"a-text-strike\".*span>/", $page, $precio_antes);
preg_match("/<span id=\"priceblock_ourprice\".*span>/", $page, $precio_ahora);
preg_match("/<td class=\"a-span12 a-color-price a-size-base\".*td>/", $page, $precio_descuento);
echo "Antes: ".$precio_antes. "Ahora: " .$precio_ahora." (-" .$precio_descuento. "%)";

You should change the regular expressions a bit because you will not always get the expected results:
Make the .* lazy, otherwise it will skip some closing </span> tags and go right to the last one: .*?;
Make the . also match newlines, otherwise matches will fail to find anything if the opening and closing tags are not on the same line: use the s pattern modifier;
As tags can be nested, make sure to test on the ending tag, including the slash: /span> instead of just span>;
The result in the third argument of preg_match is an array, so you need to take the element of your interest: [0]
Adapted Code:
preg_match("/<span\s+class=\"a-text-strike\".*?\/span>/si", $page, $precio_antes);
preg_match("/<span\s+id=\"priceblock_ourprice\".*?\/span>/si", $page, $precio_ahora);
preg_match("/<td\s+class=\"a-span12\s+a-color-price\s+a-size-base\".*?\/td>/si", $page, $precio_descuento);
echo "Antes: {$precio_antes[0]} Ahora: {$precio_ahora[0]} (-{$precio_descuento[0])}";
Parse the DOM
Using regular expressions for parsing HTML is not advised in general. There will always be cases where it goes wrong because the provider changed the order of attributes, or replaced the double quotes by single quotes, ... etc. You really should have a look at DOMDocument to parse HTML.

$precio_descuento is array so
Not $precio_descuento but $precio_descuento[0]
or implode('', $precio_descuento)
ps: the same with others

Related

how to match this text from between html tags using preg_match

I need to match
<TD WIDTH=30%><B>Joining Date</B></TD></TR>STRINGTOBEMATCHED</TABLE>
with preg_match... tried using preg_quote but still, there's something wrong with the string, preg_match thinks B is an oprator of some kind..
I suggest you read this thread about HTML parsing. Nowadays there are loads of XML/HTML parsers you can use.
Since the HTML code is very badly written (attribute values have no quotes, text occurs inside table but outside tr), it's hard to parse the HTML code.
Still, to answer your question, one can use this code, since you need the string to be matched to be between the </tr> and </table> tag:
$var = "<TD WIDTH=30%><B>Joining Date</B></TD></TR>STRINGTOBEMATCHED</TABLE>";
$regex = "%</TR>(.*?)</TABLE>%i";
$matches = null;
preg_match($regex, $var, $matches);
$result = $matches[1];
but I strongly recommend you to use a library denoted in the abovementioned thread.

Php regex to conditionally replace first occurance of string

I need to do some cleanup on strings that look like this:
$author_name = '<a href="http://en.wikipedia.org/wiki/Robert_Jones_Burdette>Robert Jones Burdette </a>';
Notice the href tag doesn't have closing quotes - I'm using the DOMParser on a large table of these to extract the text, and it borks on this.
I would like to look at the string in $author_name;
IF the first > does NOT have a " before it, replace it with "> to close the tag correctly. If it is okay, just skip and do the next step. Be sure not to replace the second > at all.
Using php regex, I haven't been able to find a working solution - I could chop up the whole thing and check its parts, but that would be slow and I think there must be a regex that can do what I want.
TIA
What you can do is, find the first closing tag, with or without the double-quote ("), and replace it with (">):
$author_name = preg_replace('/(.+?)"?>(.+?)/', '$1">$2', $author_name);
http://www.barattalo.it/html-fixer/
Download that, then include it in your php.
The rest is quite easy:
$dirty_html = ".....bad html here......";
$a = new HtmlFixer();
$clean_html = $a->getFixedHtml($dirty_html);
It's common for people to want to use regular expressions, but you must remember that HTML is not regular.

PHP multiline preg_replace to extract portion of a HTML document

I am trying to parse a HTTP document to extract portions of the document, but am unable to get the desired results. Here is what I have got:
<?php
// a sample of HTTP document that I am trying to parse
$http_response = <<<'EOT'
<dl><dt>Server Version: Apache</dt>
<dt>Server Built: Apr 4 2010 17:19:54
</dt></dl><hr /><dl>
<dt>Current Time: Wednesday, 10-Oct-2012 06:14:05 MST</dt>
</dl>
I do not need anything below this, including this line itself
......
EOT;
echo $http_response;
echo '********************';
$count = -1;
$a = preg_replace("/(Server Version)([\s\S]*?)(MST)/", "$1$2$3", $http_response, -1, $count);
echo "<br> count: $count" . '<br>';
echo $a;
I still see the string "I do not need ..." in the output. I do not need that string. What am I doing wrong?
How do I easily remove all other HTML tags as well?
Thanks for your help.
-Amit
You are matching everything from Server Version until MST. And only the part that is matched will later be modified by preg_replace. Everything not covered by the regex remains untouched.
So to replace the string part before your first anchor, and the text following, you also must match them first.
= preg_replace("/^.*(Server Version)(.*?)(MST).*$/s", "$1$2$3",
See the ^.* and .*$. Both will be matched, but aren't mentioned in the replacement pattern; so they get dropped.
Also of course, might be simpler to just use preg_match() in such cases ...
You need to capture other caracters after / before your regex, like :
/.+?(Server Version)([\s\S]*?)(MST).+?/s
The 's' is a flag telling preg to match multiple lines, you'll need it.
To remove html tags, use strip_tags.

Regex for html attributes in php

I am trying to parse a string of HTML tag attributes in php. There can be 3 cases:
attribute="value" //inside the quotes there can be everything also other escaped quotes
attribute //without the value
attribute=value //without quotes so there are only alphanumeric characters
can someone help me to find a regex that can get in the first match the attribute name and in the second the attribute value (if it's present)?
Never ever use regular expressions for processing html, especially if you're writing a library and don't know what your input will look like. Take a look at simplexml, for example.
Give this a try and see if it is what you want to extract from the tags.
preg_match_all('/( \\w{1,}="\\w{1,}"| \\w{1,}=\\w{1,}| \\w{1,})/i',
$content,
$result,
PREG_PATTERN_ORDER);
$result = $result[0];
The regex pulls each attribute, excludes the tag name, and puts the results in an array so you will be able to loop over the first and second attributes.

php preg_match two examples

I need to preg_match for
src="http:// "
where the blank space following // is the rest of the url ending with the ". My adapted doesn't seem to work:
preg_match('#src="(http://[^"]+)#', $data, $match);
And I am also struggling to get text that starts with > and ends with EITHER a full stop . or an exclamation mark ! or a question mark ? I have no idea how to do this one. An example of the text I want to preg_match for is:
blahblahblah>Hello world this is what I want.
I'm hoping a kind preg_match guru can tell me the answer and save me hours of headscratching.
Thanks for reading.
As for the URL:
preg_match('#src="(.*?)"#', $data, $match);
and for the second case, use />(.*?)(\.|!|\?)/
(.*?)" will match any character greedily up until the time it sees the end double quote
It seems that you want to parse a document or string which follows a HTML, DOM, XML or something similiar structure.
Use XPath, and parse to the Tag and let it return the src Attribute, this will save much trouble and you can forget about regular expressions.
Example: CLICK ME

Categories