php dom change nodeValue in anchor

php dom change nodeValue in anchor - php

I am trying to change NodeValue and save it to variable (or print it)
$html = '<html><body>
some a
some b
</body></html>';
libxml_use_internal_errors(true); // ignore malformed HTML
$xml = new DOMDocument();
$xml->loadHTML($html);
foreach($xml->getElementsByTagName('a') as $link) {
$link->nodeValue = $link->nodeValue . ' --- ' . $link->getAttribute('href');
}
print_r($html);
should print
<html><body>
some a --- a.html
some b --- b.html
</body></html>
but it won't. What am I doing wrong?

You're not actually changing $html, you are changing your DomDocument variable $xml. Instead of
print_r($html);
You need to:
echo $xml->saveHTML()

Related

How to access an HTML attribute and retrieve data from it in PHP?

I'm new to PHP and I would like to know how to retrieve data from an HTML element such as an src?
It's very easy to do that in jQuery:
$('img').attr('src');
But I have no idea how to do it in PHP (if it is possible).
Here's an example I'm working on:
I loaded $result into SimpleXMLElement and stored it into $xml:
$xml = simplexml_load_string($result) or die("Error: Cannot create object");
Then used foreach to loop over all elements:
foreach($xml->links->link as $link){
echo 'Image: ' . $link->{'link-code-html'}[0] . '</br>';
// returns sometihing similar to: <a href='....'><img src='....'></a>
}
Inside of the foreach I'm trying to access links (src) in img.
Is there a way to access src of the img nested inside of the a — clear when outputted to the screen:
echo 'Image: ' . $link->{'link-code-html'}[0] . '</br>';

I would do this with the built-in DOMDocument and DOMXPath APIs, and then you can use the getAttribute method on any matching img node:
$doc = new DOMDocument();
// Load some example HTML. If you need to load from file, use ->loadHTMLFile
$doc->loadHTML("<a href='abc.com'><img src='ping1.png'></a>
<a href='def.com'><img src='ping2.png'></a>
<a href='ghi.com'>something else</a>");
$xpath = new DOMXpath($doc);
// Collect the images that are children of anchor elements
$imgs = $xpath->query("//a/img");
foreach($imgs as $img) {
echo "Image: " . $img->getAttribute("src") . "\n";
}

What is the XPATH query to extract contents of a class from a div on a webpage in php?

I have written the following code but it just returns empty data :
enter code here
$code="CS225";
$url="https://cs.illinois.edu/courses/profile/{$code}";
echo $url;
$html = file_get_contents($url);
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(TRUE); //disable libxml errors
if(!empty($html)){ //if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors();
$pokemon_xpath = new DOMXPath($pokemon_doc);
$pokemon_row = $pokemon_xpath->query("//div[#id='extCoursesDescription']");
if($pokemon_row->length > 0){
foreach($pokemon_row as $row){
echo $row->nodeValue . "<br/>";
}
}
}
the website that i am trying to scrape is : https://cs.illinois.edu/courses/profile/CS225

The course content seems to be loaded on the source by the page on loading. But if you go through the source that is loaded you get to ...
<script type='text/javascript' src='//ws.engr.illinois.edu/courses/item.asp?n=3&course=CS225'></script>
From this you can track through to the url http://ws.engr.illinois.edu/courses/item.asp?n=3&course=CS225 and this gives you the actual content your after. So rather than the original URL, use this new one and you should be able to extract the information from there.
Although this content is all wrapped in document.write()'s.
Update:
To remove the document() bits - a simple way is to just process the content...
$html = file_get_contents($url);
$html = str_replace(["document.write('","');"], "", $html);
$html = str_replace('\"', '"', $html);

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.

You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

PHP DomDocument editing all links

I am using the following code to grab html from another page and place it into my php page:
$doc = new DomDocument;
// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
$content = $doc->getElementById('form2');
echo $doc->SaveHTML($content);
I want to change all instances of <a href="/somepath/file.htm"> so that I can prepend to it the actual domain instead. How can I do this?
So, it would need to change them to: <a href="http://mydomain.com/somepath/file.htm"> instead.

try something like:
$xml = new DOMDocument();
$xml->loadHTMLFile($url);
foreach($xml->getElementsByTagName('a') as $link) {
$oldLink = $link->getAttribute("href");
$link->setAttribute('href', "http://mydomain.com/" . $oldLink);
}
echo $xml->saveHtml();

Get link from html by php

How do I get this link <li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li> from an html document using PHP? How do I get the link by the "rel"?
I'm trying to get /1149/

Trying to understand what you want… If you want to take an HTML/XML input and grab the href value of a link with the attribute rel="prev" I'd suggest using DOMXpath, something like:
$html = '<li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[#rel='prev']") as $node) {
if ($node->hasAttribute('href')) {
echo $node->getAttribute('href') . '<br>';
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php dom change nodeValue in anchor - php

You're not actually changing $html, you are changing your DomDocument variable $xml. Instead of print_r($html); You need to: echo $xml->saveHTML()

Related

How to access an HTML attribute and retrieve data from it in PHP?

What is the XPATH query to extract contents of a class from a div on a webpage in php?

Retrieve data from html page using xpath and php

PHP DomDocument editing all links

Get link from html by php

Categories

Resources