Strip attributes in HTML tag span - php

I'm fetching data through CURL request and while parsing the HTML I'm some spans having attributes are not being parsed neatly.
Snippet of HTML code
<div class="ftlt" style="width:250px;">
<div class="tdiv"><span class="prop_price_img"></span><span class="property_price">PROPERTY_PRICE</span></div>
<p class="adPrice">AREA</p>
<h4>
<p style="float:left;width:251px;font-family:Arial, Helvetica, sans-serif;font-size:13px;padding:2px 10px 10px 0px;">TITLE,
<span style="color:#666;"> CITY_NAME.</span>
<a title="title, Sale" style="color:#3266CC;font-size:12px;text-decoration:underline;">View on map</a></p>
</h4>
<p style="font-weight:bold;color:#666;">
Premium
</p>
<div class="clr"></div>
</div>
I have to access the CITY_NAME element neatly.
I have been able to fetch that node through HTML DOM as
$spans = $html->find(div.ftlt span);
$city_value=strip_tags($spans[2]);
This $city_value is getting morphed.
I've tried removeAttribute method.Maybe I'm not doing it properly.
If regex can be applied, I want to know how?

$spans = $html->find(div.ftlt span);
$city_value=$spans[2]->nodeValue;
Why don't you use nodeValue?

Related

Remove span tag from element html dom parser

I have code like this, and it's fetching data from other website.
require('simple_html_dom.php');
$html = file_get_html("www.example.com");
$info['diesel'] = $html->find(".on .price",0)->innertext;
$info['pb95'] = $html->find(".pb .price",0)->innertext;
$info['lpg'] = $html->find(".lpg .price",0)->innertext;
The html code from other website looks:
<a href="#" class="station-detail-wrapper on text-center active">
<h3 class="fuel-header">ON</h3>
<div class="price">
5,97
<span>zł</span>
</div>
</a>
So if i use echo $info['diesel'] it shows me 5,97 zł. I would like to delete this <span>zł</span> to show price only.
May be you can replace that span tag with blank:
echo $info['diesel']=str_replace("<span>zł</span>","",$info['diesel']);

Extract links from specific table

I have a html code with many html tables. I want to extract links from specific one which has specific div above.
Here's my sample code:
<div class="boxuniwersal_header">Table 1</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some.jpg" width="45" /></td><td>Some text</td></tr></table></div>
<br />
</div>
</div>
<!-- /box -->
<!-- box -->
<div class="boxuniwersal_header">Table 2</div>
<img src="img/boxuniwersal_top.gif" width="210" height="18" alt="" style="margin-top: 5px" />
<div class="boxuniwersal_content">
<div class="boxuniwersal_subcontent">
<div class='menu_m1'><table cellpadding="3"><tr><td><img src="some2.jpg" width="45" /></td><td>Some text2</td></tr></table></div>
<br />
</div>
</div>
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query("//div/div/table/tr/td/a|//table//tr/td//a"); //querying domdocument
foreach($results as $result)
{
$links[]=$result->getAttribute("href");
}
This code returns all links. I want to grab only links from Table1. Is it possible?
Your main problem is just tuning the XPath expression to select the right XML.
If you change your XPath to
//div[text()="Table 1"]/following-sibling::div[1]//table//a
What this does is first find the <div> element whose text is the one your after.
The following-sibling::div[1] part will look at the first <div> element at the same level as the <div> element already selected (this is the one where the <table> is).
The last part just looks for all <a> elements within the enclosing <table>.

Simple html dom and php fetch hidden content

I am using Simple HTML DOM parser to fetch some data. Everything works great but I am facing a problem when I have enabled the read more plugin on my WordPress site.
The hidden content (the rest content of the article) is inside this div.
A sample:
<div class="mycontent">
Here is some content
<div class="brm" style="display: none;">
Here is another content but it's not vissible because the style of this div is set to display:none
</div>
<p>read more..</p>
</div>
So far I am using:
$url = "www.myurl.com";
$html = new simple_html_dom();
$html->load_file($url);
$maindiv = $html->find('div.mycontent',0)->outertext;
it displays everything except the content inside the div <div class="brm" style="display: none;">
Any ideas how to get the hidden content?
It actually does get that div:
include 'simple_html_dom.php';
$str = <<<EOF
<script type="text/javascript">
<div class="mycontent">
Here is some content
<div class="brm" style="display: none;">
Here is another content but it's not vissible because the style of this div is set to display:none
</div>
<p>read more..</p>
</div>
EOF;
$html = str_get_html($str);
echo $html->find('div.mycontent',0)->outertext;
// <div class="mycontent"> Here is some content <div class="brm" style="display: none;"> Here is another content but it's not vissible because the style of this div is set to display:none </div> <p>read more..</p> </div>

php DOM doesn't give all elements

I'm trying to use php DOM to grab the src of the images within a "thumbnailCarousel" div in an html. but somehow the loadHTML doesn't contain this element. Does it have something to do with the javascript function in the html? Any suggestions or workarounds is appreciated. Thanks!
sample html:
http://www.crocs.com.sg/crocs-womens-cap-top-flat/12300,en_SG,pd.html?cid=6Z1&cgid=women-footwear-mary-janes-and-flats&intid=home14_carousel_product
div:
<div class="thumbnailCarousel" data-jcarousel="true"><ul style="left: 0px; top: 0px;">
<ul style="left: 0px; top: 0px;">
<li>
<img src="http://images.crocs.com/is/image/Crocs/12300_6Z1_ALT100?&fmt=jpeg&qlt=85,1&op_sharpen=0&resMode=sharp2&op_usm=1,1,6,0&iccEmbed=0&printRes=72&wid=60&hei=72" alt="100">
</li>
<li>
<img src="http://images.crocs.com/is/image/Crocs/12300_6Z1_ALT100?&fmt=jpeg&qlt=85,1&op_sharpen=0&resMode=sharp2&op_usm=1,1,6,0&iccEmbed=0&printRes=72&wid=60&hei=72" alt="100">
</li>
</ul>
</div>
The first thing that pops out is the markup posted in the question is invalid, missing a </ul>.

Searching an HTML document in PHP

I'm trying to use DOMDocument and XPath to search an HTML document using PHP. I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?
The HTML document can be found at http://pastie.org/1211369
How about this?
$sxml = simplexml_load_string($data);
$find = "022222";
print_r($sxml->xpath("//li[.='".$find."']/../../../div[#class='content']/h2"));
It returns:
Array
(
[0] => SimpleXMLElement Object
(
[0] => Item 2
)
)
//li[.='xxx'] will locate the li your searching for. Then we use ../ to step up three levels, before we descend into the content-div, as specified by div[#class='content']. Finally we choose the h2 child.
Just FYI, here's how to do it using DOM:
$dom = new DOMDocument();
$dom->loadXML($data);
$find = "022222";
$xpath = new DOMXpath($dom);
$res = $xpath->evaluate("//li[.='".$find."']/../../../div[#class='content']/h2");
if ($res->length > 0) {
$node = $res->item(0);
echo $node->firstChild->wholeText."\n";
}
I want to search by a number such as '022222', and it should return the value of the corresponding h2 tag. Any thoughts on how this would be done?
The HTML document can be found at http://pastie.org/1211369
To start with, the text at the provided link is not a well-formed XML or XHtml document and cannot be directly parsed with XPath.
Therefore I have wrapped it inan <html> element.
On this XML document one of the XPath expressions that selects exactly the wanted text node is:
/*/div[div/ul/li = '022222']/div[#class='content']/h2/text()
Among other advantages, this XPath expression doesn't use any reverse axes and is thus more readable.
The complete XML document on which this XPath expression is evaluated is the following:
<html>
<div class="item">
<div class="content"><h2>Item 1</h2></div>
<div class="phone">
<ul class="phone-single">
<li>01234 567890</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 2</h2></div>
<div class="phone">
<ul class="phone-multiple">
<li>022222</li>
<li>033333</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 3</h2></div>
<div class="phone">
<ul class="phone-single">
<li>02345 678901</li>
</ul>
</div>
</div>
<div class="item">
<div class="content"><h2>Item 4</h2></div>
<div class="phone">
<ul class="phone-multiple">
<li>099999999</li>
<li>088888888</li>
</ul>
</div>
</div>
</html>

Categories