<?php
if(isset($_POST["submit"]))
{
$adm=$_POST["admno"];
$phn=$_POST["phn1"];
include("model.php");
$db = new database;
$r=$db->register($adm);
while($row=mysql_fetch_array($r))
{
if($row["phn_no1"]==$phn || $row["phn_no2"]==$phn || $row["phn_no3"]==$phn)
{
$formatted = "".substr($phn,6,10)." ";
$password = $formatted + $adm;
echo $password;
$db->setpassword($adm,$password);
$pre = 'PREFIX';
$suf = '%20ThankYou';
$sms = $pre.$password.$suf;
session_start();
$ch = curl_init("http://www.perfectbulksms.in/Sendsmsapi.aspx? USERID=ID&PASSWORD=PASS&SENDERID=SID&TO=$phn&MESSAGE=$sms");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
$result = curl_exec($ch);
curl_close($ch);
header("Location:password.php?msg=new");
}
else
{
header("Location:register.php?msg=invalid");
}
}
}
?>
this code is working perfect on my local host .. but when i put it on server ... it takes lots of time but the code in curl command is not working it only refers to next page ... i checked that curl is enabled .. if i use only sms api without curl command it sends sms immidiately.... but i want to run both header and also want to hide my sms api.... is there any alternate of this ???
This question already has answers here:
What is the best way to check if a URL exists in PHP?
(5 answers)
Closed 7 years ago.
I'm trying to make broken link checker with php.
I modified some php code i found online i'm not php programmer.
It let's in some unbroken link's but thats ok.
However I have problem with all presentation, zips and so on...
Basicly if it is downlaod then algorithm thinks it's a dead link.
<?php
set_time_limit(0);
//ini_set('memory_limit','512M');
$servername = "localhost";
$username = "";
$password = "";
try {
$conn = new PDO("mysql:host=$servername;dbname=test", $username, $password);
// set the PDO error mode to exception
$conn->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
echo "Connected successfully" . "<br />";
echo "----------------------------------------------------<br />";
}
catch (PDOException $e) {
echo "Connection failed: " . $e->getMessage();
}
$sql = "SELECT object,value FROM metadata where xpath = 'lom/technical/location'";
$result = $conn->query($sql)->fetchAll(PDO::FETCH_ASSOC);
//print_r($result);
$array_length = sizeof($result); //26373
//$array_length = 26373;
$i = 0;
$myfile = fopen("Lom_Link_patikra1.csv", "w") or die("Unable to open file!");
$menu_juosta = "Objektas;Nuoroda;Klaidos kodas;\n";
//fwrite($myfile,$menu_juosta);
for ($i; $i < $array_length; $i++) {
$new_id = $result[$i]["object"];
$sql1 = "SELECT published from objects where id ='$new_id'";
$result_published = $conn->query($sql1)->fetchAll(PDO::FETCH_ASSOC);
//print_r ($result_published);
if ($result_published[0]["published"] != 0) {
$var1 = $result[$i]["value"];
$var1 = str_replace('|experience|902', '', $var1);
$var1 = str_replace('|packed_in|897', '', $var1);
$var1 = str_replace('|packed_in|911', '', $var1);
$var1 = str_replace('|packed_in|895', '', $var1);
$request_response = check_url($var1); // Puslapio atsakymas
if ($request_response != 200) {
$my_object = $result[$i]["object"] . ";" . $var1 . ";" . $request_response . ";\n";
fwrite($myfile, $my_object);
}
}
}
fclose($myfile);
$conn = null;
function check_url($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$headers = curl_getinfo($ch);
curl_close($ch);
return $headers['http_code'];
}
Link example : http://lom.emokykla.lt/MO/Matematika/pazintis_su_erdviniais%20_kunais_1.doc
Any solutions, advice?
Thank you all for help.Now it works way faster. It seems there is problem with blank spaces, but that's even intriguing.
As it seems the problem i had was in understanding, how http status is working, like what it return's and why. Link's that i had marked as bad,but working where 301 or 302 - Redirect's.
https://en.wikipedia.org/wiki/List_of_HTTP_status_codes
Thank you all for help.
Using CURL for remote file
function checkRemoteFile($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
// don't download content
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
if(curl_exec($ch)!==FALSE)
{
return true;
}
else
{
return false;
}
}
EDIT: I may have misunderstood you but if you just want to check if the url actually exists than the code below will be all you need.
function url_exists($url) {
if(#file_get_contents($url,0,NULL,0,1))
{return 1;}
else
{return 0;}
}
curlopt_nobody set to TRUE makes a HTTP HEAD request instead of a GET request, so try using curl_setopt( $ch, CURLOPT_NOBODY, true );
Try to use file_exists method : http://php.net/manual/fr/function.file-exists.php
I'm having some problems looping this script through a large database of 1m+ items. The script returns the size of an image in bytes from it's url and inserts the result into a database.
I get the browser error Error code: ERR_EMPTY_RESPONSE on my test attempt. This doesn't bode well. Am I trying to loop through too many records with a while loop? Any methods for a fix?
<?php
error_reporting(E_ALL);
mysql_connect('xxxx', 'xxxx', 'xxxx') or die("Unable to connect to MySQL");
mysql_select_db('xxxx') or die("Could not select database");
$result = mysql_query("SELECT * FROM items");
if (mysql_num_rows($result)) {
while ($row = mysql_fetch_array($result)) {
$ch = curl_init($row['bigimg']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$info = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
curl_close($ch);
mysql_query("UPDATE items SET imgsize = '" . $info . "' WHERE id=" . $row['id'] . " LIMIT 1");
}
}
?>
I think your issue might be related to the fact you are trying to call eachtime curl_exec. You might want to change your code to this in 2 parts: first retrieve the data from the database and then make the curl calls.
I have around 600k of image URLs in different tables and am downloading all the images with the code below and it is working fine. (I know FTP is the best option but somehow I can’t use it.)
$queryRes = mysql_query("SELECT url FROM tablName LIMIT 50000"); // everytime I am using LIMIT
while ($row = mysql_fetch_object($queryRes)) {
$info = pathinfo($row->url);
$fileName = $info['filename'];
$fileExtension = $info['extension'];
try {
copy("http:".$row->url, "img/$fileName"."_".$row->id.".".$fileExtension);
} catch(Exception $e) {
echo "<br/>\n unable to copy '$fileName'. Error:$e";
}
}
Problems are:
After some time, say 10 minutes, scripts give 503 error. But still continue downloading the images. Why, it should stop copying it?
And it does not download all the images, everytime there will be difference of 100 to 150 images. So how can I trace which images are not downloaded?
I hope I have explained well.
first of all... copy will not throw any exception... so you are not doing any error handling... thats why your script will continue to run...
second... you should use file_get_contets or even better, curl...
for example you could try this function... (I know... its open and closes curl every time... just an example i found here https://stackoverflow.com/a/6307010/1164866)
function getimg($url) {
$headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$headers[] = 'Connection: Keep-Alive';
$headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
$user_agent = 'php';
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
curl_setopt($process, CURLOPT_HEADER, 0);
curl_setopt($process, CURLOPT_USERAGENT, $useragent);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
$return = curl_exec($process);
curl_close($process);
return $return;
}
or even.. try to doit with curl_multi_exec and get your files dowloaded in parallel, wich will be a lot faster
take a look here:
http://www.php.net/manual/en/function.curl-multi-exec.php
edit:
to track wich files failed to download you need to do something like this
$queryRes = mysql_query("select url from tablName limit 50000"); //everytime i am using limit
while($row = mysql_fetch_object($queryRes)) {
$info = pathinfo($row->url);
$fileName = $info['filename'];
$fileExtension = $info['extension'];
if (!#copy("http:".$row->url, "img/$fileName"."_".$row->id.".".$fileExtension)) {
$errors= error_get_last();
echo "COPY ERROR: ".$errors['type'];
echo "<br />\n".$errors['message'];
//you can add what ever code you wnat here... out put to conselo, log in a file put an exit() to stop dowloading...
}
}
more info: http://www.php.net/manual/es/function.copy.php#83955
I haven't used copy myself, I'd use file_get_contents it works fine with remote servers.
edit:
also returns false. so...
if( false === file_get_contents(...) )
trigger_error(...);
I think 50000 is too large. Network is every time consuming, downloading an image might cost over 100 ms(depend on your nerwork condition), so 50000 images, in the most stable case(without timeout or some other errors), might cost 50000*100/1000/60 = 83 mins, that's really a long time for script like php. If you run this script as a cgi(not cli), normally you only got 30 secs by default(without set_time_limit). So I recommend making this script a cronjob and run it every 10 secs to fetch about 50 urls maybe.
To make the script only fetch a few images each time, you must remember which ones have been processed(successfully) alreay. For example, you can add a flag column to the url table, by default, the flag = 1, if url processed successfully, it becomes 2, or it becomes 3, which means the url got something wrong. And each time, the script can only select the ones which flag=1(3 might be also included, but sometimes, the url might be so wrong so re-try won't work).
copy function is too simple, I recommend using curl instead, it's more reliable, and you can got the exactlly network info of downloading.
Here the code:
//only fetch 50 urls each time
$queryRes = mysql_query ( "select id, url from tablName where flag=1 limit 50" );
//just prefer absolute path
$imgDirPath = dirname ( __FILE__ ) + '/';
while ( $row = mysql_fetch_object ( $queryRes ) )
{
$info = pathinfo ( $row->url );
$fileName = $info ['filename'];
$fileExtension = $info ['extension'];
//url in the table is like //www.example.com???
$result = fetchUrl ( "http:" . $row->url,
$imgDirPath + "img/$fileName" . "_" . $row->id . "." . $fileExtension );
if ($result !== true)
{
echo "<br/>\n unable to copy '$fileName'. Error:$result";
//update flag to 3, finish this func yourself
set_row_flag ( 3, $row->id );
}
else
{
//update flag to 3
set_row_flag ( 2, $row->id );
}
}
function fetchUrl($url, $saveto)
{
$ch = curl_init ( $url );
curl_setopt ( $ch, CURLOPT_FOLLOWLOCATION, true );
curl_setopt ( $ch, CURLOPT_MAXREDIRS, 3 );
curl_setopt ( $ch, CURLOPT_HEADER, false );
curl_setopt ( $ch, CURLOPT_RETURNTRANSFER, true );
curl_setopt ( $ch, CURLOPT_CONNECTTIMEOUT, 7 );
curl_setopt ( $ch, CURLOPT_TIMEOUT, 60 );
$raw = curl_exec ( $ch );
$error = false;
if (curl_errno ( $ch ))
{
$error = curl_error ( $ch );
}
else
{
$httpCode = curl_getinfo ( $ch, CURLINFO_HTTP_CODE );
if ($httpCode != 200)
{
$error = 'HTTP code not 200: ' . $httpCode;
}
}
curl_close ( $ch );
if ($error)
{
return $error;
}
file_put_contents ( $saveto, $raw );
return true;
}
Strict checking for mysql_fetch_object return value is IMO better as many similar functions may return non-boolean value evaluating to false when checking loosely (e.g. via !=).
You do not fetch id attribute in your query. Your code should not work as you wrote it.
You define no order of rows in the result. It is almost always desirable to have an explicit order.
The LIMIT clause leads to processing only a limited number of rows. If I get it correctly, you want to process all the URLs.
You are using a deprecated API to access MySQL. You should consider using a more modern one. See the database FAQ # PHP.net. I did not fix this one.
As already said multiple times, copy does not throw, it returns success indicator.
Variable expansion was clumsy. This one is purely cosmetic change, though.
To be sure the generated output gets to the user ASAP, use flush. When using output buffering (ob_start etc.), it needs to be handled too.
With fixes applied, the code now looks like this:
$queryRes = mysql_query("SELECT id, url FROM tablName ORDER BY id");
while (($row = mysql_fetch_object($queryRes)) !== false) {
$info = pathinfo($row->url);
$fn = $info['filename'];
if (copy(
'http:' . $row->url,
"img/{$fn}_{$row->id}.{$info['extension']}"
)) {
echo "success: $fn\n";
} else {
echo "fail: $fn\n";
}
flush();
}
The issue #2 is solved by this. You will see which files were and were not copied. If the process (and its output) stops too early, then you know the id of the last processed row and you can query your DB for the higher ones (not processed). Another approach is adding a boolean column copied to tblName and updating it immediately after successfully copying the file. Then you may want to change the query in the code above to not include rows with copied = 1 already set.
The issue #1 is addressed in Long computation in php results in 503 error here on SO and 503 service unavailable when debugging PHP script in Zend Studio on SU. I would recommend splitting the large batch to smaller ones, launching in a fixed interval. Cron seems to be the best option to me. Is there any need to lauch this huge batch from browser? It will run for a very long time.
It is better handled batch-by-batch.
The actual script
Table structure
CREATE TABLE IF NOT EXISTS `images` (
`id` int(60) NOT NULL AUTO_INCREMENTh,
`link` varchar(1024) NOT NULL,
`status` enum('not fetched','fetched') NOT NULL DEFAULT 'not fetched',
`timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
);
The script
<?php
// how many images to download in one go?
$limit = 100;
/* if set to true, the scraper reloads itself. Good for running on localhost without cron job support. Just keep the browser open and the script runs by itself ( javascript is needed) */
$reload = false;
// to prevent php timeout
set_time_limit(0);
// db connection ( you need pdo enabled)
try {
$host = 'localhost';
$dbname= 'mydbname';
$user = 'root';
$pass = '';
$DBH = new PDO("mysql:host=$host;dbname=$dbname", $user, $pass);
}
catch(PDOException $e) {
echo $e->getMessage();
}
$DBH->setAttribute( PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION );
// get n number of images that are not fetched
$query = $DBH->prepare("SELECT * FROM images WHERE status = 'not fetched' LIMIT {$limit}");
$query->execute();
$files = $query->fetchAll();
// if no result, don't run
if(empty($files)){
echo 'All files have been fetched!!!';
die();
}
// where to save the images?
$savepath = dirname(__FILE__).'/scrapped/';
// fetch 'em!
foreach($files as $file){
// get_url_content uses curl. Function defined later-on
$content = get_url_content($file['link']);
// get the file name from the url. You can use random name too.
$url_parts_array = explode('/' , $file['link']);
/* assuming the image url as http:// abc . com/images/myimage.png , if we explode the string by /, the last element of the exploded array would have the filename */
$filename = $url_parts_array[count($url_parts_array) - 1];
// save fetched image
file_put_contents($savepath.$filename , $content);
// did the image save?
if(file_exists($savepath.$file['link']))
{
// yes? Okay, let's save the status
$query = $DBH->prepare("update images set status = 'fetched' WHERE id = ".$file['id']);
// output the name of the file that just got downloaded
echo $file['link']; echo '<br/>';
$query->execute();
}
}
// function definition get_url_content()
function get_url_content($url){
// ummm let's make our bot look like human
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
return curl_exec($ch);
}
//reload enabled? Reload!
if($reload)
echo '<script>location.reload(true);</script>';
503 is a fairly generic error, which in this case probably means something timed out. This could be your web server, a proxy somewhere along the way, or even PHP.
You need to identify which component is timing out. If it's PHP, you can use set_time_limit.
Another option might be to break the work up so that you only process one file per request, then redirect back to the same script to continue processing the rest. You would have to somehow maintain a list of which files have been processed between calls. Or process in order of database id, and pass the last used id to the script when you redirect.
I'm trying to make a script, which search the list of urls given in the form for the email adresses. Could anyone advice me how to do it? Is there some alternative to cURL?
I tried to make it with file_get_contents, but the script analyze only the last url given in the form: when I enter for example two urls to the form, the first "print_r("show current_url:". $current_url); is empty and for the second it shows the page(url) content(without pictures).
I asked on different forums, but received no answer. Will really appraciate your help.
Thank you
$urls = explode("\n", $_POST['urls']);
$db = new mysqli('localhost', 'root', 'root', 'urls');
if (mysqli_connect_errno()) {
echo 'Błąd: ';
exit;
}
for ($i=0; $i<count($urls); $i++){
print_r("show link:". $urls[$i]."<br>");
$current_url = file_get_contents($urls[$i]);
print_r("show current_url:". $current_url);
preg_match( "/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $current_url, $email);//email
print_r ("show email:".$email[0]);
$query = "INSERT INTO urle set adres = '$email[0]' ";
$result = $db->query($query);
}
if ($query) {
echo $db->affected_rows ."pozycji dodano.";
} else {
echo mysql_errno() . ":" . mysql_error() . "Wystąpił błąd przy dodawaniu urli ";
}
$db->close();
?>
EDIT:
I have tried with curl. var_dump($email); shows: array(0) { }
The script displays now all of the urls given in the form in the browser, but preg_match doesn't work, so it doesn't extract email adresses.
<?php
$urls = explode("\n", $_POST['urls']);
$db = new mysqli('localhost', 'root', 'root', 'linki');
if (mysqli_connect_errno()) {
echo 'Błąd: ';
exit;
}
for ($i=0; $i<count($urls); $i++){
$url = $urls[$i];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
$output = curl_exec($ch);
preg_match( "/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $output, $email);//email
var_dump($email);
$query = "INSERT INTO urle set adres = '$email[0]' ";
$result = $db->query($query);
curl_close($ch);
}//
if ($result) {
echo $db->affected_rows ."pozycji dodano.";
} else {
echo mysql_errno() . ":" . mysql_error() . "Wystąpił błąd przy dodawaniu urli ";
}
$db->close();
?>
Is there some alternative to cURL?
file_get_contents, which doesn't give you any error messages (unless error_reporting is raised), and which is often blocked unless ini_set("user_agent", ...) was set.
Alternatively HttpRequest on newer PHP installations.
Still curl is not difficult to use. The manual is full of examples.
the first "print_r("show current_url:". $current_url); is empty
Nobody can tell. It's your duty to debug that (especially since you haven't mentioned the affected url in your question). Use curl or httprequest.
Ok, i've fixed it!!!:)
Here is the code:
for ($i=0; $i<count($linki); $i++){
$url = $linki[$i];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$result =curl_exec($ch);
curl_close($ch);
preg_match("/[-a-z0-9\._]+#[-a-z0-9\._]+\.[a-z]{2,4}/", $result, $email);//email
print_r($email);
$zapytanie = "INSERT INTO urle set adres = '$email[0]' ";
$wynik = $db->query($zapytanie);
}