I'm trying to load an xml file from another website. I can do this using cURL using the following:
function getLatestPlayerXML($par1) {
$url = "http://somewebsite/page.php?par1=".$par1;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xmlresponse = curl_exec($ch);
$xml = simplexml_load_string($xmlresponse);
$xml->asXML("./userxml/".$par1.".xml");
return $xml;
}
This works all well and good, however, the external website takes a long time to respond with the file, which is why I save the xml file to ./userxml/$par1.xml which also works. I load like this:
function getLocalPlayerXML($par1) {
$xml = simplexml_load_file("./userxml/".$par1.".xml");
if($xml != False) {
// How can I make it so that when called it only temporarily uses this file until the latest is available?
return $xml;
} else {
return $getLatestPlayerXML($par1);
}
}
The problem I am having is that I want it so when I call a single load function it first tries to load the xml from file and if it exists use that file until the latest file has been received at which point, update the page. If the file does not exist, simply wait until the latest file has been retrieved and then use that. Is even possible?
Related
I have a php script that loads this webpage to extract some data from it's tables.
The following methods failed to get it's table contents:
Using file_get_contents:
$document -> file_get_contents("http://www.webpage.com/");
print_r($document);
Using cURL:
$document = curl_init('http://www.webpage.com/');
curl_setopt($document, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($document);
print_r($html);
Using loadHTMLFile:
$document->loadHTMLFile('http://www.webpage.com/');
print_r($document);
I'm not an expert in php and except the first method, the other ones are copied from StackOverflow's answers.
What am I doing wrong?
and How they do block some contents from loading?
Not the answer you're likely to want to hear, but none of the methods you describe will evaluate JavaScript and other browser resources as a normal browser client would. Instead, each of those methods retrieves the contents of only the file you've specified. A quick glance at the site you're targeting clearly shows this table in question being populated as the result of an AJAX call, which none of the methods you've tried are able to evaluate.
You'll need to lean on a library or script that has the capability for this type of emulation; namely laravel/dusk, the PHP bindings for Selenium webdriver, or something similar.
This is what I did to scrape data from a webpage using php curl:
// Defining the basic cURL function
function curl($url) {
$ch = curl_init(); // Initialising cURL
curl_setopt($ch, CURLOPT_URL, $url); // Setting cURL's URL option with the $url variable passed into the function
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Setting cURL's option to return the webpage data
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
// Defining the basic scraping function
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$target_url = "https://www.somesite.com";
$scraped_website = curl($target_url);
$data_set_1 = scrape_between($scraped_website, "%before%", "%after%");
$data_set_2 = scrape_between($scraped_website, "%before%", "%after%");
The %before% and %after% is data that always shows up on the webpage before and after the data you wish to grab. Could be div tags or some other html tags that are unique to the data you wish to grab.
So maybe look into using curl and and imitate the same ajax request that the site is using? When I searched for that, this is what I found:
Mimicking an ajax call with Curl PHP
Basically I want to have one centralized file (preferably .php or .txt).. In it I will define the version and online statuses of my 3 API's (login, register, and stats)
I will somehow link it to my system status page and call upon them in my html with like $version, $login, $register, or $stats and they will automatically display whatever is defined in the centralized file.
My stats page (https://epicmc.us/status.php).. I want to define it all from a seperate file and call upon it in the HTML.
I tried making an external file called check.php and put this in it:
<?php
$version = "1.0.0";
$login = 'online';
$register = 'online';
$stats = 'online';
echo json_encode(compact('version','login','register','stats'));
?>
and then in my stats page I called upon it with
<?php
$data= json_decode(file_get_contents('https://epicmc.us/api/bridge/check.php'),true);
echo $version;
echo $login;
echo $register;
echo $stats;
?>
The page is just blank though.
How would you go about implementing this into my stats page code?
http://pastebin.com/nREdfH1u
A good solution here would be to curl your file.
As you already return a JSON string containing your values, just curl your 'check.php' file and json_decode the response.
One of the advantages of this method is that you can access these informations from other domains.
You should be able to get all the values easily.
Example :
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'check.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // to return the response in a variable and not output it
// $result contains the output string
$result = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
$array_response = json_decode($result, true);
// echo $array_response['version']...
i just want to get the name of 'channel' tag i.e. CHANNEL...the script works fine when i use it to parse the rss from Google..............but when i use it for some other provider it gives an output '#text' instead of giving 'channel' which is the intended output.......the following is my script plz help me out.
$url = 'http://ibnlive.in.com/ibnrss/rss/sports/cricket.xml';
$get = perform_curl($url);
$xml = new DOMDocument();
$xml -> loadXML($get['remote_content']);
$fetch = $xml -> documentElement;
$gettitle = $fetch -> firstChild -> nodeName;
echo $gettitle;
function perform_curl($rss_feed_provider_url){
$url = $rss_feed_provider_url;
$curl_handle = curl_init();
// Do we have a cURL session?
if ($curl_handle) {
// Set the required CURL options that we need.
// Set the URL option.
curl_setopt($curl_handle, CURLOPT_URL, $url);
// Set the HEADER option. We don't want the HTTP headers in the output.
curl_setopt($curl_handle, CURLOPT_HEADER, false);
// Set the FOLLOWLOCATION option. We will follow if location header is present.
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
// Instead of using WRITEFUNCTION callbacks, we are going to receive the remote contents as a return value for the curl_exec function.
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);
// Try to fetch the remote URL contents.
// This function will block until the contents are received.
$remote_contents = curl_exec($curl_handle);
// Do the cleanup of CURL.
curl_close($curl_handle);
$remote_contents = utf8_encode($remote_contents);
$handle = #simplexml_load_string($remote_contents);
$return_result = array();
if(is_object($handle)){
$return_result['handle'] = true;
$return_result['remote_content'] = $remote_contents;
return $return_result;
}
else{
$return_result['handle'] = false;
$return_result['content_error'] = 'INVALID RSS SOURCE, PLEASE CHECK IF THE SOURCE IS A VALID XML DOCUMENT.';
return $return_result;
}
} // End of if ($curl_handle)
else{
$return_result['curl_error'] = 'CURL INITIALIZATION FAILED.';
return false;
}
}
php
it gives an output '#text' instead of giving 'channel' which is the intended output it happens because the $fetch -> firstChild -> nodeType is 3, which is a TEXT_NODE or just some text. You could select channel by
echo $fetch->getElementsByTagName('channel')->item(0)->nodeName;
and
$gettitle = $fetch -> firstChild -> nodeValue;
var_dump($gettitle);
gives you
string(5) "
"
or spaces and a new line symbol which happens to appear between the xml tags due to formatting.
ps: and RSS feed by your link fails validation at http://validator.w3.org/feed/
Take a look at the XML - it's been pretty printed with whitespace so it is being parsed correctly. The first child of the root node is a text node. I'd suggest using SimpleXML if you want an easier time of it, or use XPath queries on your DomDocument to obtain the tags of interest.
Here's how you'd use SimpleXML
$xml = new SimpleXMLElement($get['remote_content']);
print $xml->channel[0]->title;
First of all have a look at here,
www.zedge.net/txts/4519/
this page has so many text messages , I want my script to open each of the message and download it,
but i am having some problem,
This is my simple script to open the page,
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$contents = curl_exec ($ch);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_close ($ch);
?>
The page download fine but how would i open every text message page inside this page one by one and save its content in a text file,
I know how to save the content of a webpage in a text file using curl but in this case there are so many different pages inside the page i've downloaded how to open them one by one seperately ?
I've this idea but don't know if it will work,
Downlaod this page,
www.zedge.net/txts/4519
look for the all the links of text messages page inside the page and save each link into one text file (one in each line), then run another curl session , open the text file read each link one by one , open it copy the content from the particular DIV and then save it in a new file.
The algorithm is pretty straight forward:
download www.zedge.net/txts/4519 with curl
parse it with DOM (or alternative) for links
either store them all into text file/database or process them on the fly with "subrequest"
// Load main page
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$contents = curl_exec ($ch);
$dom = new DOMDocument();
$dom->loadHTML( $contents);
// Filter all the links
$xPath = new DOMXPath( $dom);
$items = $xPath->query( '//a[class=myLink]');
foreach( $items as $link){
$url = $link->getAttribute('href');
if( strncmp( $url, 'http', 4) != 0){
// Prepend http:// or something
}
// Open sub request
curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
$subContent = curl_exec( $ch);
}
See documentation and examples for xPath::query, note that DOMNodeList implements Traversable and therefor you can use foreach.
Tips:
Use curl opt COOKIE_JAR_FILE
Use sleep(...) not to flood server
Set php time and memory limit
I used DOM for my code part. I called my desire page and filtered data using getElementsByTagName('td')
Here i want the status of my relays from the device page. every time i want updated status of relays. for that i used below code.
$keywords = array();
$domain = array('http://USERNAME:PASSWORD#URL/index.htm');
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
foreach ($domain as $key => $value) {
#$doc->loadHTMLFile($value);
//$anchor_tags = $doc->getElementsByTagName('table');
//$anchor_tags = $doc->getElementsByTagName('tr');
$anchor_tags = $doc->getElementsByTagName('td');
foreach ($anchor_tags as $tag) {
$keywords[] = strtolower($tag->nodeValue);
//echo $keywords[0];
}
}
Then i get my desired relay name and status in $keywords[] array.
Here i am sharing of Output.
If you want to read all messages in the main page. then first you have to collect all link for separate messages. Then you can use it for further same process.
A remote site is supplying a data structure in a js file.
I can include this file in my page to access the data and display it in my page.
<head>
<script type="text/javascript" src="http://www.example.co.uk/includes/js/data.js"></script>
</head>
Does anyone know how I use PHP to take this data and store in it a database?
You should GET that file directly, via, for example, CURL. Then parse it, if it comes in JSON, you can use json-decode.
Simple example (slightly modified version of code found here):
<?php
$url = "http://www.example.co.uk/includes/js/data.js";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
...
$output = curl_exec($ch);
$info = curl_getinfo($ch);
if ($output === false || $info['http_code'] != 200) {
$error = "No cURL data returned for $url [". $info['http_code']. "]";
if (curl_error($ch))
$error .= "\n". curl_error($ch);
}
else {
$js_data = json_decode($output);
// 'OK' status; save $class members in the database, or the $output directly,
// depending on what you want to actually do.
...
}
//Display $error or do something about it
?>
You can grab the file via CURL or some other HTTP downloading library/function. Then, parse the data. If you're lucky, the data is in a JSON format and you can use a PHP function to convert it into a PHP array. Then, iterate through the items in the array, inserting each into your database.