I want to get the value inside a span id or div id. For example from
<span id="lastPrice">29.00</span>
i want to get the 29.00. please help.
If you want to do this through regexp that badly.
$string = "<span id=\"lastPrice\">29.00</span>";
preg_match("/\<span id\=\"lastPrice\"\>([\d.]+?)\<\/span\>/",$string,$match);
print "<pre>"; var_dump($match[1]); print "</pre>";
But there are far more better ways.
Take a look at XPath => PHP - Searching and filtering with XPath
[...] XPath is a darn sight easier than regular expressions for basic usage. That said, it might take a little while to get your head around all the possibilities it opens up to you!
Google will give u plenty of examples and tutorials.
Related
Simple HTML DOM is basically a php you add to your pages which lets you have simple web scraping. It's good for the most part but I can't figure out the manual as I'm not much of a coder. Are there any sites/guides out there that have any easier help for this? (the one at php.net is a bit too complicated for me at the moment) Is there a better place to ask this kind of question?
The site for it is at: http://simplehtmldom.sourceforge.net/manual.htm
I can scrape stuff that has specific classes like <tr class="group">, but not for stuff that's in between. For example.. This is what I currently use...
$url = 'http://www.test.com';
$html = file_get_html($url);
foreach($html->find('tr[class=group]') as $result)
{
$first = $result->find('td[class=category1]',0);
$second = $result->find('td[class=category2]',0);
echo $first.$second;
}
}
But here is the kind of code I'm trying to scrape.
<table>
<tr class="Group">
<td>
<dl class="Summary">
<dt>Heading 1</dt>
<dd>Cat</dd>
<dd>Bacon</dd>
<dt>Heading 2</dt>
<dd>Narwhal</dd>
<dd>Ice Soap</dd>
</dl>
</td>
</tr>
</table>
I'm trying to extract the content of each <dt> and put it to a variable. Then I'm trying to extract the content of each <dd> and put it to a variable, but nothing I tried works. Here's the best I could find, but it gives me back only the first heading repeatedly rather than going to the second.
foreach($html->find('tr[class=Summary]') as $result2)
{
echo $result2->find('dt',0)->innertext;
}
Thanks to anyone who can help. Sorry if this is not clear or that it's so long. Ideally I'd like to be able to understand these DOM commands more as I'd like to figure this out myself rather than someone here just do it (but I'd appreciate either).
TL;DR: I am trying to understand how to use the commands listed in the manual (url above). The 'manual' isn't easy enough. How do you go about learning this stuff?
I think $result2->find('dt',0) gives you back element 0, which is the first. If you omit that, you should be able to get an array (or nodelist) instead. Something like this:
foreach($html->find('tr[class=Summary]') as $result2)
{
foreach ($result2->find('dt') as $node)
{
echo $node->innertext;
}
}
You don't strictly need the outer for loop, since there's only 1 tr in your document. You could even leave it altogether to find each dt in the document, but for tools like this, I think it's a good thing to be both flexible and strict, so you are prepared for multiple rows, but don't accidentally parse dts from anywhere in the document.
Can someone explain to me how Hash tags get anchors after being posted. I want to work on a similar implementation in php. The logic behind that operation and links will be useful. thanks
$html = preg_replace("/#([a-z0-9_]+)/", '#${1}', htmlspecialchars($tweet_contents));
something like http://www.snipe.net/2009/09/php-twitter-clickable-links/
I was creating a Syntax Highlighter in PHP but I was failed! You see when I was creating script comments (//) Syntax Highlighting (gray) , I was facing some problems. So I just created a shortened version of my Syntax Highlighting Function to show you all my problem. See whenever a PHP variable ,i.e., $example, is inserted in between the comment it doesn't get grayed as it should be according to my Syntax Highlighter. You see I'm using preg_replace() to achieve this. But the regex of it which I'm using currently doesn't seem to be right. I tried out almost everything that I know about it, but it doesn't work. See the demo code below.
Problem Demo Code
<?php
$str = '
<?php
//This is a php comment $test and resulted bad!
$text_cool++;
?>
';
$result = str_replace(array('<','>','/'),array('[',']','%%'),$str);
$result = preg_replace("/%%%%(.*?)(?=(\n))/","<span style=\"color:gray;\">$0</span>",$result);
$result = preg_replace("/(?<!\"|'|%%%%\w\s\t)[\$](?!\()(.*?)(?=(\W))/","<span style=\"color:green;\">$0</span>",$result);
$result = str_replace(array('[',']','%%'),array('<','>','/'),$result);
$resultArray = explode("\n",$result);
foreach ($resultArray as $i) {
echo $i.'</br>';
}
?>
Problem Demo Screen
So you see the result I want is that $test in the comment string of the 'Demo Screen' above should also be colored as gray!(See below.)
Can anyone help me solve this problem?
I'm Aware of highlight_string() function!
THANKS IN ADVANCE!
Reinventing the wheel?
highlight_string()
Also, this is why they have parsers, and regex (despite popular demand) should not be used as a parser.
I agree, that you should use existing, parsers. Every ide has a php parser, and many people have written more of them.
That said, I do think it is worth the mental exercise. So, you can replace:
$result = preg_replace("/(?<!\"|')[\$](?!\()(.*?)(?=(\W))/","<span style=\"color:green;\">$0</span>",$result);
with
//regular expression.:
//#([^(%%%%|\"|')]*)([\$](?!\()(.*?)(?=(\W)))#
//replacement text:
//$1<span style=\"color:green;\">$2</span>
$result = preg_replace("#([^(%%%%|\"|')]*)([\$](?!\()(.*?)(?=(\W)))#","$1<span style=\"color:green;\">$2</span>",$result);
Personally, I think your best bet is to use CSS selectors. Replace style=\"color:gray;\" with class="comment-text" and style=\"color:green;\" with class="variable-text" and this CSS should work for you:
.variable-text {
color: #00E;
}
.comment-text .comment-text.variable-text {
color: #DDD;
}
Insert don't use regex to parse irregular languages here
anyway, it looks like you've run into a prime example of why regular expressions are not suited for this kind of problem. You'd be better off looking into PHP's highlight_string functionality
Well, you don't seem to care that php already has a function like this.
But because of the structure of php code one cannot simply use a regex for this or walk into mordor (the latter being the easier).
You have to use a parser or you will fly over the cuckoo's nest soon.
Lots of tutorials around the net but none of them can explain me this:
How do I select a single element (in a table, for example), having its absolute XPath?
Example:
I have this:
/html/body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span
What's that PHP function to get the text of that element?!
Really I could not find an answer. Found lots of guides and hints to get all the elements of the table, all the buttons of a form, etc, but not what I need.
Thank you.
$xml = simplexml_load_string($html_content_string);
$arr = $xml->xpath("//body/table/tbody/tr[2]/td[2]/table/tbody/tr/td/table[3]/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[4]/td[5]/span");
var_dump($arr);
Load you HTML document into a DOM object then make a DOMXPath object from it and let it evaluate your query string.
It's all described in detail here: http://php.net/manual/en/book.dom.php
I'm trying to fetch data from a div (based on his id), using PHP's PCRE. The goal is to fetch div's contents based on his id, and using recursivity / depth to get everything inside it. The main problem here is to get other divs inside the "main div", because regex would stop once it gets the next </div> it finds after the initial <div id="test">.
I've tryed so many different approaches to the subject, and none of it worked. The best solution, in my oppinion, is to use the R parameter (Recursion), but never got it to work properly.
Any Ideais?
Thanks in advance :D
You'd be much better off using some form of DOM parser - regex really isn't suited to this problem. If all you want is basic HTML dom parsing, something like simplehtmldom would be right up your alley. It's trivial to install (just include a single PHP file) and trivial to use (2-3 lines will do what you need).
include('simple-html-dom.php');
$dom = str_get_html($bunchofhtmlcode);
$testdiv = $dom->find('div#test',0); // 0 for the first occurrence
$testdiv_contents = $testdiv->innertext;