PHP simple HTML DOM parser: make it loop until no error

PHP simple HTML DOM parser: make it loop until no error - php

I had an app called GrabUrTime, it's a timetable viewing utility that get its timetables from another site, my university's webspace. Every 2am I run a script that scrapes all the timetables using the parser and dump it into my database.
But today the uni's server isn't running well and my script keeps on giving me error 500 on uni's server, making the script cannot continue to run. It's periodic, not always. However I tried a few times and it just occurs randomly, no pattern at all.
Hence I want to make my script to handle the error and make it loop until it gets the data.
function grabtable($intakecode, $week) {
$html = file_get_html("http://webspace.apiit.edu.my/schedule/intakeview_intake.jsp?Intake1=".$intakecode."&Week=" . $week);
$dumb = $html->find('table[border=1] tr');
$thatarray = array();
for ($i=1; $i < sizeof($dumb);++$i){
$arow = $html->find('table[border=1] tr', $i);
$date = $arow->find('td font', 0)->innertext;
$time = $arow->find('td font', 1)->innertext;
$room = $arow->find('td font', 2)->innertext;
$loca = $arow->find('td font', 3)->innertext;
$modu = $arow->find('td font', 4)->innertext;
$lect = $arow->find('td font', 5)->innertext;
$anarray = array($date, $time, $room, $loca, $modu, $lect);
$thatarray[$i] = $anarray;
//echo "arraylol";
}
//echo serialize($tablearray)."<br/>";
$html->clear();
return $thatarray;
}

try something like this:
function getHttpCode($url)
{
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<300)
{
// YOUR CODE
}
else
{
// What you want to do should it fail
// perhaps this will serve you better as while loop, e.g.
// while($httpcode>=200 && $httpcode<300) { ... }
}
usage
getHttpCode($url);
It might not plug neatly into your code as it is but I'm sure it can help with a little re-factoring to suit your existing code structure.

Related

PHP - How to check if your sites is up (up to 15 sites)

I want to make script that check all my web sites if they are up or not.
The code works fine for one site, but when I try to check for e.g. 10 sites at once, code stop working.
<?php
function checkStatus($url){
$agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_8; pt-pt) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27";
$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<300)
return true;
else
return false;
}
$do = array();
$n = 0;
$myfile = fopen("domens.txt", "r") or die("Unable to open file!");
while(!feof($myfile)) {
$do = fgets($myfile);
$n = $n + 1;
}
fclose($myfile);
echo '<br><br>';
$trimmed = file('domens.txt', FILE_SKIP_EMPTY_LINES);
for($x=0;$x<$n;$x++){
if(checkStatus($trimmed[$x]))
echo " <br>Website is up " . $trimmed[$x];
else
echo " <br> Website is down ". $trimmed[$x];
}
?>

To check if your webserver is still working correctly, try a monitoring server like Nagios or Icinga.

function checkOnline($domain) {
$curlInit = curl_init($domain);
curl_setopt($curlInit,CURLOPT_CONNECTTIMEOUT,10);
curl_setopt($curlInit,CURLOPT_HEADER,true);
curl_setopt($curlInit,CURLOPT_NOBODY,true);
curl_setopt($curlInit,CURLOPT_RETURNTRANSFER,true);
//get answer
$response = curl_exec($curlInit);
curl_close($curlInit);
if ($response) return true;
return false;
}
if(checkOnline('http://google.com')) { echo "google online\n"; }
if(checkOnline('http://facebook.com')) { echo "facebook online\n"; }
if(checkOnline('http://stackoverflow.com')) { echo "stackoverflow online\n"; }
Code from Get the site status - up or down

PHP + CURL http response headers

I am currently attempting to configure a CURL & PHP function found online that when called checks if the HTTP response headers is in the 200-300 range to determine if the web page is up. This is successful once ran against an individual website with the code below (not the function itself but the if statements etc) The function returns true or false depending on the range of the HTTP Response header:
$page = "www.google.com";
$page = gzdecode($page);
if (Visit($page))
{
echo $page;
echo " Is OK <br>";
}
else
{
echo $page;
echo " Is DOWN <br>";
}
However when running against an array of URL's stored within the script through the use of a for each loop it reports every webpage within the list as down despite that the code is the same bar the added for loop of course.
Does anyone know what the issue may be surrounding this?
Edit - adding Visit function
My bad sorry, not fully thinking.
The visit function is the following:
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";$ch=curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch,CURLOPT_SSLVERSION,3);
curl_setopt($ch,CURLOPT_SSL_VERIFYHOST, FALSE);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode>=200 && $httpcode<310) return true;
else return false;
}
The foreach loop as mentioned looks like this:
foreach($Urls as $URL)
{
$page = $URL;
$page = gzdecode($page);
if (Visit($page))
The if loop for the visit part is the same as before.

$page = $URL;
$page = gzdecode($page);
Why are you trying to uncompress the non-compressed URL? Assuming you really meant to uncompress the content returned from the URL, why would the remote server server compress it when you you've told it that the client does not support compression? Why are you fetching the entire page to see the headers?
The code you've shown us here has never worked

Use PHP to extract query string from TXT file, form URI and make HTTP POST

I'd like to use PHP to look at a text file on my local machine. On line 1 of the file, a query string is generated automatically every few minutes:
Example: ?artist=myartist&title=mytitle&songtype=S&duration=240000
I'd like to check the file every 5-10 seconds, then take the query string, append it to
http://localhost:9595
Final HTTP request should look like:
http://localhost:9595?artist=myartist&title=mytitle&songtype=S&duration=240000
I'm NOT a code writer but have taken suggestions from others and gotten close (I think).
Code below.
<?php
/**
* This program will check a file every 5 seconds to see if it has changed...if it has, the new metadata will be sent to the shoutcast server(s)
*/
//the path to the file where your song information is placed...it is assumed that everything is on one line and is in the format you wish to send to the server
DEFINE('songfile', "c:\a\nowplaying.txt");
//simply copy and paste this for each server you need to add
$serv["host"][] = "127.0.0.1";
$serv["port"][] = "9595";
while(1)
{
$t=time();
clearstatcache();
$mt=#filemtime(songfile);
if ($mt===FALSE || $mt<1)
{
echo "file not found, will retry in 5 seconds";
sleep(5);
continue;
}
if ($mt==$lastmtime)
{
//file unchanged, will retry in 5 seconds
sleep(5);
continue;
}
$da="";
$f=#fopen(songfile, "r");
if ($f!=0)
{
$da=#fread($f, 4096);
fclose($f);
#unlink(songfile);
}
else
{
echo "error opening songfile, will retry in 5";
sleep(5);
continue;
}
$lastmtime=$mt;
for($count=0; $count < count($serv["host"]); $count++)
{
$mysession = curl_init();
curl_setopt($mysession, CURLOPT_URL, "http://".$serv["host"][$count].":".$serv["port"][$count]."/?mode=updinfo&song=".urlencode(trim($da)));
curl_setopt($mysession, CURLOPT_HEADER, false);
curl_setopt($mysession, CURLOPT_RETURNTRANSFER, true);
curl_setopt($mysession, CURLOPT_POST, true);
curl_setopt($mysession, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($mysession, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($mysession, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt($mysession, CURLOPT_CONNECTTIMEOUT, 2);
curl_exec($mysession);
curl_close($mysession);
}
echo "song updated";
sleep(5);
}
?>

The solution for yours are as follow make a ajax call to the PHP.
Here is the every5sec.php file:
<?php
$file = fopen("c:\a\nowplaying.txt", 'r');
$line = fgets($file);
fclose($file);
$url = "http://localhost:9595" . $line;
echo $url;
exit;
?>
JavaScript file here:
<script type="text/javascript">
function refresh() {
$.get('every5sec.php', function(data){
$.load(data);
});
}
setTimeout(refresh, 2000);
</script>
I think this will work.

Thank you for all your help.
After reading (and reading a lot) and with some code help from you fine folks I was able to get this working. I ended up using the below code and it's running great with one exception. I get a file lock every once in awhile, more specifically an error message (Warning: fopen(nowplaying.txt): failed to open stream: Permission denied in C:\b\nowplaying.php on line 22) -- it seems to only happen when the code tries to get the file while it's being updated.
Can I write an exception that (if the above error appears, the script will try again in 2 seconds?
UPDATE
I changed this section:
$file = fopen("nowplaying.txt", 'r');
$line = fgets($file);
fclose($file);
$url = "http://127.0.0.1:9696" . $line;
To:
$filename = 'nowplaying.txt';
$file = fopen($filename, 'r')
or exit("unable to open file ($filename)");
$line = fgets($file);
fclose($file);
$url = "http://127.0.0.1:9696" . $line;
I then put a SLEEP command at the end of the batch file I'm using to launch the code that starts it over if it exits. So far so good. :)
My Original Code:
<?php
while(1)
{
$t=time();
clearstatcache();
$mt=#filemtime("nowplaying.txt");
if ($mt===FALSE || $mt<1)
{
echo "file not found, will retry in 5 seconds";
sleep(5);
continue;
}
if ($mt==$lastmtime)
{
sleep(5);
continue;
}
$file = fopen("nowplaying.txt", 'r');
$line = fgets($file);
fclose($file);
$url = "http://127.0.0.1:9696" . $line;
$lastmtime=$mt;
$options = array(
'http' => array(
),
);
$context = stream_context_create($options);
$result = file_get_contents($url, false, $context);
var_dump($result);
sleep(5);
}
?>

something like this....
$file = fopen("nowplaying.txt", 'r');
$line = fgets($file);
fclose($file);
$url = "http://localhost:9595" . $line;
urlencode(trim($url));
$mysession = curl_init();
curl_setopt($mysession, CURLOPT_URL, $url);
curl_setopt($mysession, CURLOPT_HEADER, false);
curl_setopt($mysession, CURLOPT_RETURNTRANSFER, true);
curl_setopt($mysession, CURLOPT_POST, true);
curl_setopt($mysession, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
curl_setopt($mysession, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($mysession, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt($mysession, CURLOPT_CONNECTTIMEOUT, 2);
curl_exec($mysession);
curl_close($mysession);
As for the 5-10 second loop, I'd stay away from doing that in PHP and instead try running the script as a scheduled task or something. Also, this code has no error checking so you may want to add some.

PHP transfer data between 2 remote servers, what is fastest way?

I have Server A and Server B which exchanges some data. Server A on user request pull data from Server B using simple file_get_content with some params, so server B can do all task(database etc) and return results to A which formats and show to user. Everything is in PHP.
Now I am interested what is fastest way to to this? I made some test and average transfer time for average response from server B at (~0.2 sec). In that 0.2 sec, 0.1 sec. aprox. is Server B operational time (pulling data calling few databases etc) what mean that average transfer time for 50kb with is 0.1 sec. (servers are NOT in same network)
Should I try with:
cURL insted of file_get_content ?
Or to try to make whole thing with sockets( I never work work with sockets in PHP but I supose that easily can be done, on that way to skip web server )
or something third?
I think that time can be 'found' on shortening connection establishing, since now, every request is new connection initiating (I mean on separate file_get_content calls, or I am wrong?)
Please give me your advices in which directions to try, or if you have some better solution I am listening.

Curl:
function curl($url)
{
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec ($ch);
curl_close($ch);
return $result;
}
Sockets:
function sockets($host) {
$fp = fsockopen("www.".$host, 80, $errno, $errstr, 30);
$out = "GET / HTTP/1.1\r\n";
$out .= "Host: www.".$host."\r\n";
$out .= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
$f='';
while (!feof($fp)) {
$f .= fgets($fp, 1024);
}
return $f;
}
file_get_contents
function fgc($url){
return file_get_contents($url);
}
Multicurl
function multiRequest($data,$nobody=false,$options = array(), $oneoptions = array())
{
$curls = array();
$result = array();
$mh = curl_multi_init();
foreach ($data as $id => $d)
{
$curls[$id] = curl_init();
$url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
curl_setopt($curls[$id], CURLOPT_URL, $url);
curl_setopt($curls[$id], CURLOPT_HEADER, 0);
curl_setopt($curls[$id], CURLOPT_RETURNTRANSFER, true);
curl_setopt($curls[$id], CURLOPT_FOLLOWLOCATION,1);
curl_setopt($curls[$id], CURLOPT_USERAGENT,"Mozilla/5.0(Windows;U;WindowsNT5.1;ru;rv:1.9.0.4)Gecko/2008102920AdCentriaIM/1.7Firefox/3.0.4");
//curl_setopt($curls[$id], CURLOPT_COOKIEJAR,'cookies.txt');
//curl_setopt($curls[$id], CURLOPT_COOKIEFILE,'cookies.txt');
//curl_setopt($curls[$id], CURLOPT_NOBODY, $nobody);
if (!empty($options))
{
curl_setopt_array($curls[$id], $options);
}
if (!empty($oneoptions[$id]))
{
curl_setopt_array($curls[$id], $oneoptions[$id]);
}
if (is_array($d))
{
if (!empty($d['post']))
{
curl_setopt($curls[$id], CURLOPT_POST, 1);
curl_setopt($curls[$id], CURLOPT_POSTFIELDS, $d['post']);
}
}
curl_multi_add_handle($mh, $curls[$id]);
}
$running = null;
do
{
curl_multi_exec($mh, $running);
}
while($running > 0);
foreach($curls as $id => $content)
{
$result[$id] = curl_multi_getcontent($content);
//echo curl_multi_getcontent($content);
curl_multi_remove_handle($mh, $content);
}
curl_multi_close($mh);
return $result;
}
Tests:
$url = 'example.com';
$start = microtime(1);
for($i=0;$i<100;$i++)
curl($url);
$end = microtime(1);
echo "Curl:".($end-$start)."\n";
$start = microtime(1);
for($i=0;$i<100;$i++)
fgc("http://$url/");
$end = microtime(1);
echo "file_get_contents:".($end-$start)."\n";
$start = microtime(1);
for($i=0;$i<100;$i++)
sockets($url);
$end = microtime(1);
echo "Sockets:".($end-$start)."\n";
$start = microtime(1);
for($i=0;$i<100;$i++)
$arr[]=$url;
multiRequest($arr);
$end = microtime(1);
echo "MultiCurl:".($end-$start)."\n";
?>
Results:
Curl: 5.39667105675 file_get_contents: 7.99799394608 Sockets:
2.99629592896 MultiCurl: 0.736907958984

what is fastest way?
get your data on a flash drive.
Now seriously.
Come on, it's network that's slow. You cannot make it faster.
To make server A response faster, DO NOT request data from the server B. That's the only way.
You can replicate your data or cache it, or just quit such a clumsy setup at all.
But as long as you have to make a network lookup on each user' request, it WILL be slow. Despite of the method you are using. It is not hte method, it is media. Isn't it obvious?

You can try another different approach: Mount the remote filesystem in the local machine. You can do that with sshfs, so you will get the additional security of an encripted connection.
It may be even more efficient since php will not have to deal with connection negotiation and establishment.

Check if a remote page exists using PHP?

In PHP, how can I determine if any remote file (accessed via HTTP) exists?

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.example.com/");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$data = curl_exec($ch);
curl_close($ch);
if (!$data) {
echo "Domain could not be found";
}
else {
preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
$code = end($matches[1]);
if ($code == 200) {
echo "Page Found";
}
elseif ($code == 404) {
echo "Page Not Found";
}
}
Modified version of code from here.

I like curl or fsockopen to solve this problem. Either one can provide header data regarding the status of the file requested. Specifically, you would be looking for a 404 (File Not Found) response. Here is an example I've used with fsockopen:
http://www.php.net/manual/en/function.fsockopen.php#39948

This function will return the response code (the last one in case of redirection), or false in case of a dns or other error. If one argument (the url) is supplied a HEAD request is made. If a second argument is given, a full request is made and the content, if any, of the response is stored by reference in the variable passed as the second argument.
function url_response_code($url, & $contents = null)
{
$context = null;
if (func_num_args() == 1) {
$context = stream_context_create(array('http' => array('method' => 'HEAD')));
}
$contents = #file_get_contents($url, null, $context);
$code = false;
if (isset($http_response_header)) {
foreach ($http_response_header as $header) {
if (strpos($header, 'HTTP/') === 0) {
list(, $code) = explode(' ', $header);
}
}
}
return $code;
}

I recently was looking for the same info. Found some really nice code here: http://php.assistprogramming.com/check-website-status-using-php-and-curl-library.html
function Visit($url){
$agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL,$url );
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch,CURLOPT_VERBOSE,false);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
$page=curl_exec($ch);
//echo curl_error($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if($httpcode >= 200 && $httpcode < 300){
return true;
}
else {
return false;
}
}
if(Visit("http://www.site.com")){
echo "Website OK";
}
else{
echo "Website DOWN";
}

Use Curl, and check if the request went through successfully.
http://w-shadow.com/blog/2007/08/02/how-to-check-if-page-exists-with-curl/

Just a note that these solutions will not work on a site that does not give an appropriate response for a page not found. e.g I just had a problem with testing for a page on a site as it just loads a main site page when it gets a request it cannot handle. So the site will nearly always give a 200 response even for non-existent pages.
Some sites will give a custom error on a standard page and not still not give a 404 header.
Not much you can do in these situations unless you know the expected content of the page and start testing that the expected content exists or test for some expected error text within the page and that is all getting a bit messy...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP simple HTML DOM parser: make it loop until no error - php

Related

PHP - How to check if your sites is up (up to 15 sites)

PHP + CURL http response headers

Use PHP to extract query string from TXT file, form URI and make HTTP POST

PHP transfer data between 2 remote servers, what is fastest way?

Check if a remote page exists using PHP?

Categories

Resources