PHP - Getting Content From External URL - php

I am trying to write a script that would get the contents between the div tags
<div class="bio">
<label>Bio:</label>
<div class="value">[This Is The Content I'm Trying To Get]</div>
</div>
This is the URL I'm trying to get the contents from:
https://live.xbox.com/en-US/Profile?gamertag=EMT%20PoRsChE
How would I be able to do this?

You will want to use DOMDocument and DOMXPath
// if the below line does not work, you will need to use CURL or similar.
$theHtmlToParse = file_get_contents('http://url.to/page.html');
$doc = new DOMDocument();
$doc->loadHTMLFile($theHtmlToParse);
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#class='bio']/div[#class='value']");
// We now have an array of elements, or null
if ($elements !== null)
{
foreach ($elements as $element)
{
echo "<br/>[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node)
{
echo $node->nodeValue. "\n";
}
}
}
This should give you enough to go on :)

Yes, this is actually possible.
You might use something like visionmedia/php-selector to get the content of .value
and Guzzle or some curl to get the source before, if you haven't already.

well, this can be done using file_get_contents() function.
Simply pass the url of the webpage into this function and then create an object of it.
Navigate through the object using-> as required.

Related

php domdocument get node info

I am working with php and I am trying to get certain data from a webpage
everything works till i get to this part:
<a class="cleanthis" href="https://www.web.com" id="1122" rel="#1122" style="display: inline-block;"><strong>the data i want</strong></a>
As you can see i want the data in strong but i cant get it. I only get blank lines
code i use:
foreach($as as $a) {
if ($a->getAttribute('class') === 'cleanthis') {
$strong = $a->getElementsByTagName('strong');
echo $strong->nodeValue;;
}
You should be seeing this error message:
Undefined property: DOMNodeList::$nodeValue
That is because $strong = $a->getElementsByTagName('strong'); will put a DOMNodeList in $string. You either need to iterate the list or retrieve the actual node from it, e.g.
echo $strong->item(0)->nodeValue;
Or you can just use XPath:
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//a[#class="cleanthis"]/strong/text()') as $element) {
echo $element->nodeValue, PHP_EOL;
}

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

change variable with GET method

I have a page test.php in which I have a list of names:
name1: 992345
name2: 332345
name3: 558645
name4: 434544
In another page test1.php?id=name2 and the result should be:
332345
I've tried this PHP code:
<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("/test.php");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("//*#".$_GET["id"]."");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
I need to be able to change the name with GET PHP method in test1.pdp?id=name4
The result should be different now.
434544
is there another way, becose mine won't work?
Here is another way to do it.
<?php
libxml_use_internal_errors(true);
/* file function reads your text file into an array. */
$doc = file("test.php");
$id = $_GET["id"];
/* Show your array. You can remove this part after you
* are sure your text file is read correct.*/
echo "Seeking id: $id<br>";
echo "Elements:<pre>";
print_r($doc);
echo "</pre>";
/* this part is searching for the get variable. */
if (!is_null($doc)) {
foreach ($doc as $line) {
if(strpos($line,$id) !== false){
$search = $id.": ";
$replace = '';
echo str_replace($search, $replace, $line);
}
}
} else {
echo "No elements.";
}
?>
There is a completely different way to do this, using PHP combined with JavaScript (not sure if that's what you're after and if it can work with your app, but I'm going to write it). You can change your test.php to read the GET parameter (it can be POST as well, you'll see), and according to that, output only the desired value, probably from the associative array you have hard-coded in there. The JavaScript approach will be different and it would involve making a single AJAX call instead of DOM traversing using PHP.
So, in short: AJAX call to test.php, which then output the desired value based on the GET or POST parameter.
jQuery AJAX here; native JS tutorial here.
Just let me know if this won't work for your app, and I'll delete my answer.

How can I grab div content to display in another page?

I've searched around for solutions to this question, but each one i find, and try doesn't work.
I'm trying to grab the content of a div from a forum topic.
I've tried using preg_match and that only displayed "Array" then I tried using this method
$html = file_get_contents("http://www.lcs-server.co.uk/forum/index.php/topic,$id_topic");
$dom = new DOMDocument;
$dom->loadHTML($html);
$element = $dom->getElementById("msg_$id_msg");
var_dump($element);
This will show "object(DOMElement)#1 (0) { } "
The $id_topic and $id_msg are defined above this code, taken from the forum database. I did try taking the message from the forum database, but it displayed BB code tags, I'd like it to grab the post content, and display it in HTML, as it's displayed on the forum post itself.
This is the code I'm using now and giving me "Fatal error: Cannot redeclare DOMinnerHTML()"
$html = file_get_contents("http://www.lcs-server.co.uk/forum/index.php/topic,$id_topic");
$dom = new DOMDocument;
$dom->loadHTML($html);
$domelement = $dom->getElementById("msg_$id_msg");
foreach ($domelement as $element)
{
echo DOMinnerHTML($element);
}
function DOMinnerHTML($DOMelement)
{
$innerHTML = "";
$children = $DOMelement->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
getElementById returns a DOM node object. It does not return the HTML of the node. For that, you have to get the node's "innerHTML". This properly is not officially supported by PHP's dom object for some reason, but can be faked using this answer: How to get innerHTML of DOMNode?

How to get a div via PHP?

I get a page using file_get_contents from a remote server, but I want to filter that page and get a DIV from it that has class "text" using PHP. I started with DOMDocument but I'm lost now.
Any help?
$file = file_get_contents("xx");
$elements = new DOMDocument();
$elements->loadHTML($file);
foreach ($elements as $element) {
if( !is_null($element->attributes)) {
foreach ($element->attributes as $attrName => $attrNode) {
if( $attrName == "class" && $attrNode== "text") {
echo $element;
}
}
}
}
Once you have loaded the document to a DOMDocument instance, you can use XPath queries on it -- which might be easier than going yourself through the DOM.
For that, you can use the DOMXpath class.
For example, you should be able to do something like this :
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
var_dump($tag->textContent);
}
(Not tested, so you might need to adapt the XPath query a bit...)
Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = file_get_html('http://scrapeyoursite.com');
$html->find('div.text')->plaintext;
Pretty simple, huh? It accommodates selectors like jQuery :)
you can use simple_html_dom like here simple_html_dom doc
or use my code like here :
include "simple_html_dom.php";
$html = new simple_html_dom();
$html->load_file('www.yoursite.com');
$con_div = $html->find('div',0);//get value plaintext each html
echo the $con_div in plaintext..
$con_div->plaintext;
it's mean you will find the first div in array ('div',0) and show it in plaintext..
i hope it help you :cheer

Categories