When I run the first code it works well. The echo works.
<?php
$html = file_get_contents('https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
But when I try the following code and running it with parameters nothing is returned:
index.php?url=https://feedback.aliexpress.com/display/productEvaluation.htm?productId=32795887882&ownerMemberId=230515078&withPictures=true&i18n=true&Page=3
<?php
$html = file_get_contents($_GET["url"]);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
?>
Anyone got any idea?
Update:
Probally not the best and cleanest solution, but it works :)
<?
$url = urldecode($_GET['url']);
$ownerMemberId = urldecode($_GET['ownerMemberId']);
$withPictures = urldecode($_GET['withPictures']);
$page = urldecode($_GET['Page']);
$newurl = $url . "&ownerMemberId=" . $ownerMemberId .
"&withPictures=true&i18n=true&Page=" . $page;
$html = file_get_contents($newurl);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo "<img src='";
echo $image->getAttribute('src');
echo "'>";
echo "<br>";
}
?>
Please decode the url as it is sending another url.
$url = urldecode($_GET['url']);
$html = file_get_contents($url);
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
echo "<br>";
}
Hope that works for you.
Related
I have html code something like this:
<p><i>i_text</i>,p_text</p>
i_text,p_text
i want change all node values in this domelement and keep all tags
i_changed_text,p_changed_text
my attempts)
$html = '<p><i>i_text</i> p_text</p>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
$element->nodeValue = str_replace('_','_changed_',$element->nodeValue);
}
echo($dom->saveHTML());
output i_changed_text,p_changed_text
this code return correct text but don't save childnodes
$html = '<p><i>i_text</i>,p_text</p>';
$dom = new DOMDocument();
$dom->loadXML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
$elem = $dom->createElement('dfn', 'tag');
$attr = $dom->createAttribute('text');
$attr->value = 'element';
$elem->appendChild($attr);
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
while ($element->hasChildnodes()) {
$element = $element->childNodes->item(0);
}
$changed_value = str_replace('_','_changed_',$element->nodeValue);
$element->nodeValue = str_replace("tag", $dom->saveXML($elem), $changed_value);
}
echo ($dom->saveXML());
output
i_changed_text,p_text
this code save and change values in childnodes but don't change text in parentnode
my solution)
i_text,p_text,a_text,another one_text
$html = '<p><i>i_text</i>,p_text<b>,a_text</b>,another one_text</p>';
$dom = new DOMDocument();
$dom->loadXML($html);
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;
$elements = $dom->getElementsByTagName('*');
foreach ($elements as $element) {
if($element->hasChildnodes()==true && $element->parentNode->nodeName == '#document'){
foreach($element->childNodes as $element_child){
$element_child->nodeValue = str_replace('_','_changed_', $element_child->nodeValue);
}
}
}
echo ($dom->saveXML());
output
i_changed_text,p_changed_text,a_changed_text,another one_changed_text
I trying to extract the news headlines and the link (href) of each headline using the code bellow, but the link extraction is not working. It's only getting the headline. Please help me find out what's wrong with the code.
Link to page from which I want to get the headline and link from:
http://web.tmxmoney.com/news.php?qm_symbol=BCM
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
#$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('span');
$newstitle = $cols->item(0)->nodeValue;
$link = $cols->item(0)->nodeType === HTML_ELEMENT_NODE ? $cols->item(0)->getElementsByTagName('a')->item(0)->getAttribute('href') : '';
echo $newstitle . '<br>';
echo $link . '<br><br>';
}
?>
Thanks in advance for your help!
Try to do this:
<?php
$data= file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new DOMDocument();
#$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$hrefs= $xpath->query('/html/body//a');
for($i = 0; $i < $hrefs->length; $i++){
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = filter_var($url, FILTER_SANITIZE_URL);
if(!filter_var($url, FILTER_VALIDATE_URL) === false){
echo ''.$url.'<br />';
}
}
?>
I have found the solution. Here it goes:
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
#$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols1 = $row->getElementsByTagName('a');
$link = $cols1->item(0)->nodeType === XML_ELEMENT_NODE ? $cols1->item(0)->getAttribute('href') : '';
$cols2 = $row->getElementsByTagName('span');
$title = $cols2->item(0)->nodeValue;
$source = $cols2->item(1)->nodeValue;
echo $title . '<br>';
echo $source . '<br>';
echo $link . '<br><br>';
}
?>
I am trying to make a script which load urls from sitemap.xml and put it into array. They it should load all pages, one by one, and after each it should print something.
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
//parsovani xml, vkladani linku do pole
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
echo $url."<br />";
flush();
ob_flush();
}
?>
Still doesn't work. Loading very long time, does not print anything. I think that flush does not work.
Does somebody see the problem??
Thank you very much
Filip
I would run something like this
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$urls[] = $url->nodeValue;
}
foreach ($urls as $url) {
$data = file_get_contents($url);
echo $url."<br />". $data;
}
?>
Or even better instead of 2 loops.
<?php
set_time_limit(6000);
$urls = array();
$DomDocument = new DOMDocument();
$DomDocument->preserveWhiteSpace = false;
$DomDocument->load('sitemap.xml');
$DomNodeList = $DomDocument->getElementsByTagName('loc');
foreach($DomNodeList as $url) {
$curURL = $url->nodeValue;
$urls[] = $curURL;
$data = file_get_contents($curURL);
echo $curURL."<br />". $data;
}
?>
I have an string with some <img> in it.
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img)
{
if($img->getAttribute('src') == 'pic.jpg')
{
// I want delete that picture form string
$img->parentNode->removeChild($img);
}
else
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;
In the end of function when I print $string, the target pic has been delete but for other pic, no add any class to them!
but If I remove $img->parentNode->removeChild($img); , the class will add!
what's my wrong?
EDIT
please check for this sample string:
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
You can delete nodes if you iterate backwards.
Simply change
// Forward iteration
foreach ($imgs as $img) {
to
// Reverse iteration
for($i = $imgs->length; --$i >= 0;) {
$img = $imgs->item($i);
Ref: http://php.net/manual/en/class.domnodelist.php#83390
Finaly I solved this with this change....hope help others...
$string = ' <img src="pic.jpg"> and <img src="pic2.jpg">';
$doc = new DOMDocument('1.0', 'UTF-8');
libxml_use_internal_errors(true);
$doc->loadHTML(mb_convert_encoding($string, 'HTML-ENTITIES', 'UTF-8'));
libxml_clear_errors();
$imgs = $doc->getElementsByTagName('img');
$imgs1 = $imgs2 = array();
foreach($imgs as $img) {
if($img->getAttribute('src') == 'pic.jpg')
{
$imgs1[] = $img;
}
else
$imgs2[] = $img;
}
foreach($imgs1 as $img) {
$img->parentNode->removeChild($img);
}
foreach ($imgs2 as $img)
{
$img->setAttribute('class', 'image normall');
}
$string = $doc->saveHTML();
echo $string;
i tried to concatenate innerhtml of div into string variable:
games variable:
$games = '';
DOMinnerHTML function:
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
ExtractFromType function:
function ExtractFromType($type)
{
$html = file_get_contents('www.site.com/' .$type);
$dom = new domDocument;
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
foreach ($divs as $div) {
if (strpos($div->getAttribute('style'),'MyString') !== false) {
//////
$games = $games.DOMinnerHTML($div);
//////
}
}
}
code:
ExtractFromType('MyType');
echo $games; // = Nothing.
this code return nothing.
$games is defined in the global scope, and it's not available inside ExctractFromType. Define it inside the function, then return the value:
function ExtractFromType($type) {
$html = file_get_contents('www.site.com/' .$type);
$dom = new domDocument;
#$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$divs = $dom->getElementsByTagName('div');
$games = '';
foreach ($divs as $div) {
if (strpos($div->getAttribute('style'),'MyString') !== false) {
$games = $games.DOMinnerHTML($div);
}
}
}
echo ExtractFromType('MyType');