I've been trying to learn how to use file_get_contents in PHP, and am attempting to use it to display weather on my page from http://www.rssweather.com/wx/us/in/knight/wx.php.
<?php
$ref = file_get_contents('http://www.rssweather.com/wx/us/in/knight/wx.php.');
echo $ref;
?>
Obviously this shows the whole page I'm referencing on the screen, which is not what I want. I'm trying to show only the current weather, whether that be as simple text or some other form. I've spent some time trying to figure out how to select only portions of a file once referenced with file_get_contents, but I've had no luck figuring it out. I've see people manipulating what appear to be variables from the pages referenced, but I cannot figure out how to access those variables through my code. Would anyone have any suggestions on how best to approach this?
As a basic example of how you could use DOMDocument to capture the information you want perhaps the following will give you a headstart.
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTMLFile( $url );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$col=$xp->query('//div[#id="current"]/div');
if( !empty( $col ) ){
foreach( $col as $node )echo $node->nodeValue;
}
output
Thunder Storms Temperature: 68°F Humidity:84% Wind Speed:14 MPH Wind
Direction:NW (320°) Barometer: 29.94 in. Dewpoint:63°F Heat Index:68°F
Wind Chill:68°F Visibility: 10 mi Sunrise:5:54 AM CDT Sunset:7:39 PM
CDT Updated: 10:54 PM CDT SAT APR 29 2017
Updated the code to include libxml error handling and added additional flags for DOMDocument.
To preserve the original formatting you can get a little better by using cloneNode
$url='http://www.rssweather.com/wx/us/in/knight/wx.php';
libxml_use_internal_errors( true );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->strictErrorChecking=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTMLFile( $url );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$col=$xp->query( '//div[#id="current"]/div' );
if( !empty( $col ) ){
foreach( $col as $node ){
$html=new DOMDocument;
$clone = $node->cloneNode( true );
$html->appendChild( $html->importNode( $clone, true ) );
echo $html->saveHTML();
}
}
$dom = $xp = $col = $html = $clone = null;
Related
I'm trying to understand how to scrape decoded phone numbers from a yellow page website with PHP & Curl.
Here is an example URL:
https://www.gelbeseiten.de/test
Normally you can technically do it with something like this:
$ch = curl_init();
$page = curl_exec($ch);
if(preg_match('#example html code (.*) example html code#', $page, $match))
$result = $match[1];
echo $result;
But on the page mentioned above you cannot directly find the phone number in the HTML code. There must be a way to get the phone number.
Can you please help me out?
Best regards,
Jennifer
Don't use regex to parse html, use an html parser like DOMDocument, i.e.:
$html = file_get_contents("https://www.gelbeseiten.de/test");
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//span[contains(#class,"nummer")]') as $item) {
print trim($item->textContent);
}
Output:
(0211) 4 08 05(0211) 4 08 05(0211) 4 08 05(0211) 4 08 05(0231) 9 79 76(0231)...
As suggested in a comment - using an XPath expression yields the phone numbers as desired.
$url='https://www.gelbeseiten.de/test';
$dom=new DOMDocument;
$dom->loadHTMLFile( $url );
$xp=new DOMXpath( $dom );
$query='//li[#class="phone"]';
$col=$xp->query($query);
if( $col ){
foreach( $col as $node )echo $node->nodeValue . "<br />";
}
$dom = $xp = $col = null;
I want to get a specific js object from a different url using php.
Or
I want to get js script text from a different url using php.
I am using this approach.
$html = file_get_contents($url);
$ddoc = new DOMDocument();
libxml_use_internal_errors(TRUE);
if(!empty($html)){ //if any html is actually returned
$ddoc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$xxpath = new DOMXPath($ddoc);
$rrrow = $xxpath->query("//script[contains(#src, 'pcode')]");
}
You neglect to state what, if anything, is happening with your code. I tried vitrually an identical approach and it worked perfectly ( see below ) so without knowing the url which you are trying to target I would suggest that you try adding a context to the file_get_contents as, in many cases, a server can be configured to reject requests where there is no User-Agent string present.
$url='http://beautifulbathrooms.tumblr.com/';
$query='//script[contains(#src,"jquery")]';
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( file_get_contents( $url ) );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$col=$xp->query( $query );
if( !empty( $col ) ){
foreach( $col as $script ) echo $script->getAttribute('src').BR;
}
With a context argument to file_get_contents
$url='http://beautifulbathrooms.tumblr.com/';
$query='//script[contains(#src,"jquery")]';
$args=array(
'http'=>array(
'method' => 'GET',
'header' => implode( "\n", array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:44.0) Gecko/20100101 Firefox/44.0',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Host: beautifulbathrooms.tumblr.com'
)
)
)
);
/* create the context */
$context=stream_context_create( $args );
$dom=new DOMDocument;
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( file_get_contents( $url, FILE_TEXT, $context ) );
libxml_clear_errors();
$xp=new DOMXPath( $dom );
$col=$xp->query( $query );
if( !empty( $col ) ){
foreach( $col as $script ) echo $script->getAttribute('src').BR;
}
I have a xml file and some of it nodes has a CDATA Block like this:
<item>
<content>OneWord</content>
</item>
<item>
<content><![CDATA[Some Text or Serialized arrays]]></content>
</item>
And I tried to get outer node as bellow:
$file = 'file.xml';
$contents = file_get_contents( $file );
$dom = new DOMDocument( '1.0', 'utf-8' );
$dom->loadXML( $contents, LIBXML_NOCDATA );
$xpath = new DOMXPath( $dom );
// -- get outer
$item = $xpath->query( './item' )->item(1);
$str = $dom->saveXML($item);
var_dump($str);
And it print item node without CDATA block but I want that node has CDATA Blocks.
Thanks
Is it not as simple as removing the LIBXML_NOCDATA option ("Merge CDATA as text nodes")?
For me,
$dom = new DOMDocument( '1.0', 'utf-8' );
$dom->loadXML( $contents );
$xpath = new DOMXPath( $dom );
// -- get outer
$item = $xpath->query( './item' )->item(1);
$str = $dom->saveXML($item);
var_dump($str);
outputs
string '<item>
<content><![CDATA[Some Text or Serialized arrays]]></content>
</item>' (length=78)
This question already has answers here:
How can I get useful error messages in PHP?
(41 answers)
Closed 7 years ago.
<?php
$doc = new DOMDocument;
$doc->load("www.xyz.com/ABC");
$xpath = new DOMXpath($doc);
$elements = $xpath->query("*/div[#id='myspanId']");
?>
I am trying to get the value of "myspanId" from webpage "www.xyz.com/ABC".
But it displays error.
Also tried : $doc->loadHTML("www.xyz.com/ABC");
You probably don't need the curl but this is how I would approach it.
$curl=curl_init( 'http://www.example.com/ABC' );
/* Set other curl options */
$response=curl_exec( $curl );
curl_close( $curl );
libxml_use_internal_errors( true );
$dom = new DOMDocument('1.0','utf-8');
$dom->validateOnParse=false;
$dom->standalone=true;
$dom->preserveWhiteSpace=true;
$dom->strictErrorChecking=false;
$dom->substituteEntities=false;
$dom->recover=true;
$dom->formatOutput=false;
$dom->loadHTML( mb_convert_encoding( $response, 'utf-8' ) );
$parse_errs=serialize( libxml_get_last_error() );
libxml_clear_errors();
/* there was a typo here - should be DOMXPath! */
$xpath=new DOMXPath( $dom );
$elements = $xpath->query("*/div[#id='myspanId']");
I've never really used the DOM parser before and now I have a question.
How would I go about extracting the URL from this markup:
<files>
<file path="http://www.thesite.com/download/eysjkss.zip" title="File Name" />
</files>
Using simpleXML:
$xml = new SimpleXMLElement($xmlstr);
echo $xml->file['path']."\n";
Output:
http://www.thesite.com/download/eysjkss.zip
To do it with DOM you do
$dom = new DOMDocument;
$dom->load( 'file.xml' );
foreach( $dom->getElementsByTagName( 'file' ) as $file ) {
echo $file->getAttribute( 'path' );
}
You can also do it with XPath:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
foreach( $xPath->evaluate( '/files/file/#path' ) as $path ) {
echo $path->nodeValue;
}
Or as a string value directly:
$dom = new DOMDocument;
$dom->load( 'file.xml' );
$xPath = new DOMXPath( $dom );
echo $xPath->evaluate( 'string(/files/file/#path)' );
You can fetch individual nodes also by traversing the DOM manually
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->load( 'file.xml' );
echo $dom->documentElement->firstChild->getAttribute( 'path' );
Marking this CW, because this has been answered before multiple times (just with different elements), including me, but I am too lazy to find the duplicate.
you can use PHP Simple HTML DOM Parser,this is a php library。http://simplehtmldom.sourceforge.net/