How to get the element in arrays - php

I'm working on my PHP to search for the elements. There are are one element called <p id='links'>, I use simple_html_dom method to parsing the contains from my script called get-listing.php.
Here is the example output from get-listing.php:
<p id='channels'>101 ABC FAMILY</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=ABC FAMILY&id=101'>http://www.mysite.com/get-listing.php?channels=ABC FAMILY&id=101</a>
</p>
<a id="aTest" href="">Stream 1</a>
<p id='channels'>102 CBS</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=CBS&id=102'>http://www.mysite.com/get-listing.php?channels=CBS&id=102</a>
</p>
<a id="aTest" href="">Stream 1</a>
<p id='channels'>103 CNN USA</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=CNN USA&id=103'>http://www.mysite.com/get-listing.php?channels=CNN USA&id=103</a>
</p>
<a id="aTest" href="">Stream 1</a>
<p id='channels'>105 ESPN USA</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=ESPN USA&id=105'>http://www.mysite.com/get-listing.php?channels=ESPN USA&id=105</a>
</p>
<a id="aTest" href="rtmp://$OPT:rtmp-raw=rtmp://ny.iguide.to/edge playpath=49f5xnbs2wra0ut swfUrl=http://player.ilive.to/player_ilive_2.swf pageUrl=http://www.ilive.to token=UYDk93k#09sdafjJDHJKAD873">Stream 1</a>
<p id='channels'>106 FOX News</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=FOX News&id=106'>http://www.mysite.com/get-listing.php?channels=FOX News&id=106</a>
</p>
<a id="aTest" href="">Stream 1</a>
<p id='channels'>107 Animal Planet</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=Animal Planet&id=107'>http://www.mysite.com/get-listing.php?channels=Animal Planet&id=107</a>
</p>
<a id="aTest" href="">Stream 1</a>
<p id='channels'>108 USA Network</p>
<p id='links'>
<a href='http://www.mysite.com/get-listing.php?channels=USA Network&id=108'>http://www.mysite.com/get-listing.php?channels=USA Network&id=108</a>
</p>
<a id="aTest" href="">Stream 1</a>
Here is my PHP script:
<?php
ini_set('max_execution_time', 300);
$errmsg_arr = array();
$errflag = false;
$link;
include ('simple_html_dom.php');
$base1 = "http://www.mysite.com/get-listing.php";
$html = file_get_html($base1);
$countp = $html->find('p');
header("Content-type: text/xml");
$xml .= "<?xml version='1.0' encoding='UTF-8' ?>";
//echo $xml;
$xml .= '<tv generator-info-name="www.testbox.elementfx.com/xmltv">';
?>
I want to create the loops to get the url in each array from get-listing.php with one element id=links.
Can you please tell me how I can do that?

Assuming simple_html_dom.php gets your data as described here http://simplehtmldom.sourceforge.net/ then you should be able to use
foreach to go through the results
$links = $html->find('p[id=links] a');
foreach ($links as $link) {
//Get raw URL's here
$urls[] = $link->href;
}
EDIT
if you want to sort through the hrefs you could do a few simple tests here
foreach ($links as $link) {
//Get raw URL's here
if (strstr($link->href,'get_listing')) {
$listings[] = $link->href;
} else {
$general[] = $link->href;
}
}

Related

Simple HTML DOM find tags and fetch data from page link

Simple HTML DOM find tags and fetch data from page link
Hi I'm Simple HTML DOM, basically i need to get h2 title and the content from
the links (page/id/1). The point I'm getting stack is getting data from page .
The format should be the same that is
Title
contet form lik1 ,
content from link5
title 2
content from link ,
content from 2
<section class="level">
<h2> title </h2>
<a class="links" href="page/id/1">link1 </a>
<a class="links" href="page/id/2">link2 </a>
<a class="links" href="page/id/3">link3 </a>
<a class="links" href="page/id/4">link4 </a>
<a class="links" href="page/id/5">link5 </a>
</section>
<section class="level">
<h2> title 2 </h2>
<a class="links" href="page/id/7">link1 </a>
<a class="links" href="page/id/8">link2 </a>
</section>
<section class="level">
<h2> title 3 </h2>
<a class="links" href="page/id/9">link2 </a>
<a class="links" href="page/id/10">link3 </a>
</section>
I know it should be along these line any help guys
foreach ($html->find('h2') as $key => $value) {
echo $html->find('h2',0)->plaintext;
//this is where Im stack getting the data from the link
foreach ( ) {
echo data from the link example.com/page.php/id/1
echo data from the link example.com/page.php/id/2
}
}
You could find the <section> with the classname level using find('section[class=level]') Then you could for example loop the childnodes and check the nodeName.
To get only the anchors, you could use find('section[class=level] a')
For example:
$html = new simple_html_dom();
$html->load($data);
$result = $html->find('section[class=level]');
foreach ($result as $item) {
foreach($item->childNodes() as $childNode) {
if ($childNode->nodeName() === "h2") {
echo $childNode->innertext . "<br>";
}
if ($childNode->nodeName() === "a") {
echo $childNode->getAttribute("href") . "<br>";
}
}
}
Result
title
page/id/1
page/id/2
page/id/3
page/id/4
page/id/5
title 2
page/id/7
page/id/8
title 3
page/id/9
page/id/10

innertext in simple_html_dom

Why do inner text is not active
Here is HTML code
[Here is HTML code]
<ul class="product">
<li class="product col-md-4 col-sm-4 col-xs-6 "><div class="product-header">
<a href="/so-mi-octopus-xanh-soc-trang-p5163098.html">
<img src="//cdn.nhanh.vn/cdn/store/17863/ps/20170925/0ctopus_thumb_450x600.jpg" class="attachment-shop_catalog size-shop_catalog wp-post-image">
</a><div class="buttons">
<a href="/so-mi-octopus-xanh-soc-trang-p5163098.html" rel="nofollow" class="button add_to_cart_button">
<i class="fa fa-shopping-bag" aria-hidden="true"></i>
<span class="screen-reader-text">Thêm vào giỏ</span></a>
<a data-product_id="5163098" class="button btnFav" rel="nofollow">
<i class="fa fa-heart-o" aria-hidden="true"></i>
<span class="screen-reader-text">Yêu thích</span>
</a></div></div><h3>Sơ mi Octopus xanh sọc trắng</h3><span class="price">
<span class="woocommerce-Price-amount amount">
400,000 ₫ </span>
</span></li>
</ul>
[Here is my code]
<?php
require "simple_html_dom.php";
$html=file_get_html("http://zuhaus.vn/zu-design-pc150502.html?page=1");
$ds=$html->find("ul.products li");
foreach ($ds as $sp) {
# code...
$price=$sp->find("span.price span",0);
echo $price;
$name=$sp->find("h3 a",1)->innertext;
echo $name;
}
?>
I have tried a lot of test case but it wont work :"<
Thanks you
P/s I used library simple_html_dom
If you want to get the contents of a tag and included in h3 then you have a syntax error in the line
$name = $sp->find("h3 a", 1)->innertext;
I suggest checking the syntax for the following
$name = $sp->find('h3', 1)->find('a', 1)->innertext;
the problem is with your selector, i changed the selector for the product name, and it just worked, also use curl to improve the fast of crawling
<?php
require "simple_html_dom.php";
$ch = curl_init("http://zuhaus.vn/zu-design-pc150502.html?page=1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$curl_scraped_page = curl_exec($ch);
$html = new simple_html_dom();
$html->load($curl_scraped_page);
$ds=$html->find("ul.products li");
foreach ($ds as $sp) {
# code...
$price=$sp->find("span.price span",0);
echo $price;
$name=$sp->find("a",3)->innertext; // this is where the problem on your code
echo $name;
echo "</br>";
}
?>

PHP String replacement inside tags

I am new to PHP. I'd like an output similar to this:
<a title="icon-logo" href="http://localhost/projects/"><icon
class="icon"> </icon>About US</a>
But I am getting the following output:
<a title="icon-logo" href="http://localhost/projects/">About
US<icon class="icon"> </icon > </a>
I need an icon tag before text.
My Code
<?php
$item_output = 'About US';
$iconVar = "<icon class ='icon'> </icon>";
$output = preg_replace('/(<a.*?>[^<]*?)</', '$1'.$iconVar . "<", $item_output);
echo $output;
?>

Find and replace html(Zend_Dom_Query+createElement()): Call to a member function createElement() on a non-object

I need to find and replace some html elements inside an html code (I followed this answer: Getting an element from PHP DOM and changing its value), to do so I retrieve the content with:
$transport = $observer->getTransport();
$html = $transport->getHtml();
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
and this is the result:
<div class="page-title category-title">
<h1>Title</h1>
</div>
<div class="category-products">
<div class="toolbar">
<div class="pager">
<p class="amount">Items 2 to 2 of 2 total</p>
<div class="limiter">
<label>Show</label>
<select onchange="setLocation(this.value)">
<option value="limit=1" selected="selected">1</option>
</select>per page</div>
<div class="pages"> <strong>Page:</strong>
<ol>
<li>
<a class="previous i-previous" href="p=1" title="Previous">
<img src="skin/frontend/default/default/images/pager_arrow_left.gif" alt="Previous" class="v-middle" />
</a>
</li>
<li>1
</li>
<li class="current">2</li>
</ol>
</div>
</div>
<div class="sorter">
<p class="view-mode">
<label>View as:</label> <strong title="Grid" class="grid">Grid</strong> List </p>
<div class="sort-by">
<label>Sort By</label>
<select onchange="setLocation(this.value)">
<option value="dir=asc&order=position" selected="selected">Position</option>
<option value="dir=asc&order=name">Name</option>
<option value="dir=asc&order=price">Price</option>
</select> <img src="skin/frontend/default/default/images/i_asc_arrow.gif" alt="Set Descending Direction" class="v-middle" />
</div>
</div>
</div>
<ul class="products-grid">
<li class="item first">
<a href="test/a-2.html" title="a" class="product-image">
<img src="media/catalog/product/cache/1/small_image/135x/9df78eab33525d08d6e5fb8d27136e95/images/catalog/product/placeholder/small_image.jpg" width="135" height="135" alt="a" />
</a>
<h2 class="product-name">a</h2>
<div class="price-box"> <span class="regular-price" id="product-price-2">
<span class="price">$1.00</span> </span>
</div>
<div class="actions">
<button type="button" title="Add to Cart" class="button btn-cart" onclick="setLocation('test/a-2.html')"><span><span>Add to Cart</span></span>
</button>
<ul class="add-to-links">
<li>
Add to Wishlist
</li>
<li>
<span class="separator">|</span> Add to Compare
</li>
</ul>
</div>
</li>
</ul>
<script type="text/javascript">
decorateGeneric($$('ul.products-grid'), ['odd', 'even', 'first', 'last'])
</script>
<div class="toolbar-bottom">
<div class="toolbar">
<div class="pager">
<p class="amount">Items 2 to 2 of 2 total</p>
<div class="limiter">
<label>Show</label>
<select onchange="setLocation(this.value)">
<option value="limit=1" selected="selected">1</option>
</select>per page</div>
<div class="pages"> <strong>Page:</strong>
<ol>
<li>
<a class="previous i-previous" href="p=1" title="Previous">
<img src="skin/frontend/default/default/images/pager_arrow_left.gif" alt="Previous" class="v-middle" />
</a>
</li>
<li>1
</li>
<li class="current">2</li>
</ol>
</div>
</div>
<div class="sorter">
<p class="view-mode">
<label>View as:</label> <strong title="Grid" class="grid">Grid</strong> List </p>
<div class="sort-by">
<label>Sort By</label>
<select onchange="setLocation(this.value)">
<option value="dir=asc&order=position" selected="selected">Position</option>
<option value="dir=asc&order=name">Name</option>
<option value="dir=asc&order=price">Price</option>
</select> <img src="skin/frontend/default/default/images/i_asc_arrow.gif" alt="Set Descending Direction" class="v-middle" />
</div>
</div>
</div>
</div>
</div>
To find the lements I use Zend_Dom_Query:
$transport = $observer->getTransport();
$html = $transport->getHtml();
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
if(!is_object($document)){
Mage::log(print_r($document, TRUE), null, 'mylogfile1.log');
$transport->setHtml($html);
exit();
}
$node = $document->createElement("p", "This product isn't available in your country.");
Unfortunately it always exit in obeject check otherwise it returns this error:
Fatal error: Call to a member function createElement() on a non-object
EDIT
Full code, if anyone wants to see where I retrieve content (I have added some comments to be more clear):
//retrieve html from observer
$transport = $observer->getTransport();
$html = $transport->getHtml();
//Retrieve other info
$stored = json_decode(Mage::getStoreConfig('razorphyn/country/buttons'));
$theme=trim(Mage::getSingleton('core/design_package')->getTheme('frontend'));
$dom = new Zend_Dom_Query($html);
$document = $dom->getDocument();
//check if $document is an object
if(!is_object($document)){
Mage::log(print_r($document, TRUE), null, 'mylogfile1.log');
$transport->setHtml($html);
exit();
}
//Create node that will replace the finded ones
$node = $document->createElement("p", "This product isn't available in your country.");
$elArray=array();
$productsIds= array();
//Retrieve products id if button and store query results
if($stored[$theme]['isOnClick']){
$queryDom='button'.$stored[$theme]['class'].'[onclick*="/checkout/cart/add/"]';
$results = $dom->query($queryDom);
foreach ($results as $result) {
preg_match("/checkout\/cart\/add.+\/([0-9]+)\//",$result->getAttribute('onclick'),$currentProdId);
$elArray[$currentProdId[0]]=$result;
$productsIds[]=$currentProdId[0];
}
}
//Retrieve products id if form, runa nother query to find button and store query results
else{
$queryDom='form'.$stored[$theme]['formId'].'[action*="/checkout/cart/add/"]';
$results = $dom->query($queryDom);
foreach ($results as $result) {
preg_match("/checkout\/cart\/add.+\/([0-9]+)\//",$result->getAttribute('action'),$currentProdId);
if($currentProdId[0] && is_numeric($currentProdId[0])){
$productsIds[]=$currentProdId[0];
$formDOM = new Zend_Dom_Query($result);
$formButton = $dom->query('button'.$stored[$theme]['class']);
foreach($formButton as $child){
$elArray[$currentProdId[0]]=$child;
}
}
}
}
//Retrieve info from table
$collection = Mage::getModel('razorphyn_country/product')->getCollection()
->addFieldToFilter('active', 1)
->addAttributeToFilter('productId', array('in' => $productsIds));
$res = $collection->getFirstItem();
$country = Mage::getSingleton('core/session')->getCustomerCountry;
//Replace items
if(isset($res->allowed)){
foreach($collection as $res){
if(isset($res->allowed) && (($res->allowed==0 && strpos($res->country, $country) !== false) || ($res->allowed==1 && strpos($res->country, $country) === false))){
$document = $document->replaceChild($node,$elArray[$res->productId]);
}
}
}
//Return edited html
$html = $document->saveHTML();
$transport->setHtml($html);
You realise you are not selecting an element but creating one with createElement()?
I'm not sure where you want to place the paragraph, but let's say in the div.category-products. So let's try something like this;
$transport = $observer->getTransport();
$html = $transport->getHtml();
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[#class="category-products"]');
foreach($nodes as $node) {
$newNode = $dom->createElement("p", "This product isn't available in your country.");
$node->insertBefore($newNode, $node->nextSibling());
}

Get DIV Element contents thru DOMDocument PHP

I have to recover some news from a div of a site. The div is structured as follows:
The HTML Markup:
<ul id="news-accordion" class="rounded" style="padding: 2px;">
<li class="o">
<h3>
<span>TITLE ARTICLE</span>
<span>30/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
<li class="e">
<h3>
<span>TITLE ARTICLE</span>
<span>28/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
<li class="o">
<h3>
<span>TITLE ARTICLE</span>
<span>29/10/2014</span>
</h3>
<div style="display: none;">
<p>text of article</p>
</div>
</li>
</ul>
PHP
<?php
$doc = new DomDocument;
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('http://www.xxxxxxxxx/news.php'));
$news = $doc->getElementById('news-accordion');
$li = $news->getElementsByTagName('li');
foreach ($li as $row){
$title = $row->getElementsByTagName('h3');
echo $title->item(0)->nodeValue."<br><br>";
/*foreach ($title as $row2){
echo $row2->nodeValue."<br><br>";
//echo $row2->item(0)->nodeValue."<br><br>";
}*/
$text = $row->getElementsByTagName('p');
echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>";
}
?>
The code works correctly, but when I print the contents of the span tag echo $title->item(0)->nodeValue;,
The text of the two span is printed together.
How can I take the contents of the two span separately? Thanks.
Yes you can, just adjust the ->item() index. Just like what you have done already in the other elements, point it to that header element, then just explicitly point it to those span children:
foreach ($li as $row){
$h3 = $row->getElementsByTagName('h3')->item(0);
$title = $h3->getElementsByTagName('span')->item(0); // first span
$date = $h3->getElementsByTagName('span')->item(1); // second span
echo $title->nodeValue . '<br/>';
echo $date->nodeValue . '<br/>';
$text = $row->getElementsByTagName('p');
echo utf8_decode($text->item(0)->nodeValue)."<br><br><br>";
}
$title = $row->getElementsByTagName('h3');
echo $title->item(0)->nodeValue."<br><br>";
Replace above two line with below (instead of using h3 tag use span tag)
$title = $row->getElementsByTagName('span');
echo $title->item(0)->nodeValue."<br><br>";
echo $title->item(1)->nodeValue."<br><br>";
It's working for me.

Categories