I need to help here because I am going mad! I have a PHP script which as far as I can tell was working when it was made, in both browser and via cron/cli. However this has recently stopped working for some reason, and after investigation it is returning a "Undefined variable" error :(
So, I have viewedme.php, which called checklogin.php, which in turn calls functions.inc.php.
When run directly from the browser the script executes from end to end and performs all the actions it was designed for. However when I run this in cron using "/usr/bin/php /bla/cron_updateviewed.php", it returns this error message...
PHP Notice: Undefined variable: match in /bla/bla/functions.inc.php on line 79
What is even more annoying for me, is I have many other scripts, all of which are run from cron, calling the same function in the same manner, without error.
This is my viewedme.php...
<?php
include_once('checklogin.inc.php');
include_once('mysqli.inc.php');
$ch = curl_init();
// more curl settings here but removed as not important
$html = striplinks($html);
?>
checklogin.inc.php calls functions.inc.php, which includes this function...
// Function to strip down to links
function striplinks($document) {
preg_match_all("'<\s*a\s.*?href\s*=\s* # find <a href=
([\"\'])? # find single or double quote
(?(1) (.*?)\\1 | ([^\s\>]+)) # if quote found, match up to next matching
# quote, otherwise match up to next space
'isx", $document, $links);
while (list($key, $val) = each($links[2])) {
if (!empty($val))
$match[] = $val;
}
while (list($key, $val) = each($links[3])) {
if (!empty($val))
$match[] = $val;
}
return $match;
}
Like I said, Viewed.php is all working when access from the browser, but just not cron. Also important to add there is nothing being passed through the browser like POST or GETs which would stop this from working.
I have other scripts such as checkinbox.php which uses the striplinks function fron cron without any issues.
I am really at a loss as to wtf is going on :(
Updated
Solved "undefined variable" issue, but still returns nill
As suggested the regex wasn't being executed so I have added $match = array(); into the function before the while loops. THis fixes the undefined variable error, but it still returns 0.
After setting up a few test scripts I have found when executing the code though the browser, the cookies in curl are used. However when executing through cron/cli they are not used, therefore a different webpage is being displayed.
Here is my curl code...
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/32.0.1700.107 Chrome/32.0.1700.107 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie');
curl_setopt($ch, CURLOPT_URL, 'http://www.bla.com/');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,10);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
Is anything wrong with it as to why cookies are not used when executing via cron/cli?
My bet is that both your while loops or the ifs they contain never run, and $match stays undefined, causing the error on return.
You can fix this by initializing $match to an empty array like so:
function striplinks($document) {
$match = array(); // also possible for PHP5.4 and later: $match = [];
So after over 4 hours of trial, error, and anger I finally solved the problem. It appears CURL requires the full path to the cookie files if you are going to execute from cli or cron. Strange but there you go :P
curl_setopt($ch, CURLOPT_COOKIEJAR, '/home/full/path/cookie.txt');
Thanks to #Siguza for the array suggestion too.
Related
I had a simple parser for an external site that's required to confirm that the link user submitted leads to an account this user owns (by parsing a link to their profile from linked page). And it worked for a good long while with just this wordpress function:
function fetch_body_url($fetch_link){
$response = wp_remote_get($fetch_link, array('timeout' => 120));
return wp_remote_retrieve_body($response);
}
But then the website changed something in their cloudflare defense, and now this results in "Please wait..." page of cloudflare with no option to pass it.
Thing is, I don't even need it done automatically - if there was a captcha, the user could've complete it. But it won't show anything other than endlessly spinning "checking your browser".
Googled a bunch of curl examples, and best I could get so far is this:
<?php
$url='https://ficbook.net/authors/1000'; //random profile from requrested website
$agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_REFERER, 'https://facebook.com/');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
$response = curl_exec($ch);
curl_close($ch);
echo '<textarea>'.$response.'</textarea>';
?>
Yet it still returns the browser check screen. Adding random free proxy to it doesn't seem to work either, or maybe I wasn't lucky finding a working one (or couldn't figure out how to insert it correctly in this case). Is there any way around it? Or perhaps there is some other way to see if there is a specific keyword/link on the page?
Ok, I've spent most of the day on this problem, and seems like I got it more or less sorted. Not exactly the way I expected, but hey, it works... sort of.
Instead of solving this on the server side, I ended up looking for solution to parse it on my own PC (it has better uptime than my hosting's server anyway). Turns out, there are plenty of ready-to-use open source scrapers, including those that know how to bypass cloudflare being extra defensive for no good reason.
Solution for python dummies like myself:
Install Anaconda if you don't have python installed yet.
In cmd type pip install cloudscraper
Open Spyder (it comes along with Anaconda) and paste this:
import cloudscraper
scraper = cloudscraper.create_scraper()
print(scraper.get("https://your-parse-target/").text)
Save it anywhere and poke at run button to test. If it works, you got your data in the console window of same app.
Replace print with whatever you're gonna do with that data.
For my specific case it also required to install mysql-connector-python and to enable remote access for mysql database (and my hosting had it available for free all this time, huh?). So instead of directly verifying that user is the owner of the profile they input, there's now a queue - which isn't perfect, but oh well, they'll have to wait.
First, user request is saved to mysql. My local python script will check that table every now and then to see if anything's in line to be verified. It'll get the page's content and save it back to mysql. Then the old php parser will do its job like before, but from mysql fetch instead of actual website.
Perhaps there are better solutions that don't require resorting to measures like creating a separate local parser, but maybe this will help to someone running into similar issue.
So I have obviously googled the error - but PHP (PHP 7.4.4 (cli)) curl gives me the error:
Curl error: operation aborted by callback with the following code:
private function curl_post($url,$post,$file = null,$file_type = 'audio/wav'){
$ch = curl_init($url);
if (!empty($file)){
$post['SoundFile'] = new CURLFile(UPLOAD_PATH.$file,$file_type,$file);
}
// Assign POST data
curl_setopt($ch, CURLOPT_POST,1);
curl_setopt($ch, CURLOPT_POSTFIELDS,$post);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
if(curl_errno($ch)) echo 'Curl error: '.curl_error($ch);
curl_close($ch);
print'<pre>Curl (rec): '."\n";print_r($result);print'</pre>';
}
I control both (Ubuntu) servers and have rebooted them both. I am posting a fairly large amount of data but in the Google searching this didn't seem to be what is triggering the curl_error. Does anyone know what is causing it? It was working perfectly fine and then it stopped.
Additionally putting file_put_contents(time().'.txt','log'); as a break in my receiving server does log the response. So its clearly landing in the right area.
Additionally what I will say is that the 2 servers talk a number of times to each other through curl (so one curls to one then back a bit). Furthermore - error 42 is the CURL response but https://curl.haxx.se/libcurl/c/libcurl-errors.html doesn't seem to provide much help. I've tried tracing the various calls to each other and can't see why it is breaking - it errors/breaks before the post/calls even occur.
So I found the answer and I hope this helps anyone else in this situation. The reason being was because the file was missing on the server for the CURLFile (having previously been there). My code now reads:
if (!empty($file) && is_file(UPLOAD_PATH.$file)){
$post['SoundFile'] = new CURLFile(UPLOAD_PATH.$file,$file_type,$file);
}
And this no longer generates the error. The key was that it errored even before submitting the post but the error itself wasn't that helpful until I broke it down in a separate test script and added the file element back in.
When I call http_get it never returns, my WEB page just stops outputting at that point. The destination URL never gets the request.
<?php //simplest test of http_get I could make
print "http://kayaker.net/php/image.php?id=ORCS084144<br>";
http_get ("http://kayaker.net/php/image.php?id=ORCS084144");
print "<br>back from http_get<br>";
?>
The original script was calling http_get in a loop to send data to several other processes on another server.
The loop stops on the first call to http_get. I tried calling flush(); after every line printed, no joy. I tried setting longer timeouts in the $options parameter to http_get, that didn't help. I tried calling http_request with HTTP_METH_GET in the first argument, same problem.
This kayaker URL is not the original, just a shorter example that still fails. I took one of the original URLs and pasted it into my browser address line, it worked fine. I pasted some of the original URLs into another scripting language (The llHTTPRequest function in LSL on Open Simulator) and they work fine from there.
I stored the program above at a location where you can run it from your browser and see it fail.
I pasted the URL to the program above into another scripting language and that at least returned an error status (500) and a message "Internal Server Error" which probably just means the test program didn't terminate properly.
I must be doing something very simple stupid and basically wrong.
But what is it?
Problem
You do not seem to have the right package installed (PECL pecl_http >= 0.1.0).
Fatal error: Call to undefined function http_get() in [snip] on line 8
Solution
You can either
install pecl_http as described in the documentation.
use a different function as mentioned in the comments (file_get_contents, curl)
Thanks to the comments above and the surprisingly helpful people at my WEB hosting company, I was able to write the following function:
function http_get($url)
{
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 3); // times out after 4s
$result = curl_exec($ch); // run the whole process
curl_close($ch);
return($result);
} //http_get
This works for many different URLs, but does fail on some servers, I hope by playing with the options I can get it working there.
When I run curl on a particular url, the site stops responding and doesn't generate an error, despite my having set error reporting to on. I've tried setting the curl timeouts to low values, and it generates an error then, so I know its not timing out.
The main thing I want to know is, how could that even happen, and how can I figure out why?
The url I'm trying to access is a call to the Factual api, and the url I'm using here
(http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}})
Works when you put it in a browser. The php script works as intended if you change the latitude and longitude to essentially any other values.
error_reporting(E_ALL);
ini_set('display_errors', '2');
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
Echo "\n\n1";
$ch = curl_init($url);
Echo 2;
curl_setopt($ch, CURLOPT_HEADER, 0);
Echo 3;
curl_setopt($ch, CURLOPT_POST, 1);
Echo 4;
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,30);
Echo 5;
$output = curl_exec($ch) or die("hhtrjrstjsrjt".curl_error($ch));
Echo 6;
curl_close($ch);
Echo "out: ".$output;
It looks like you have some mistakes in your PHP configuration file.
To fix your errors you must edit your php.ini file.
For displaying errors in development mode, change the error_reporting value to E_ALL.
error_reporting=E_ALL
Then you have to enable the cURL extension.
To enable it in your php.ini, yous have to uncomment the following line:
extension=php_curl.dll
Once you edited this values, don't forget to restart your webserver (Apache or Nginx)
Also I agree with my colleagues, you should url_encode your JSON string.
From my point of view the code should be:
<?php
ini_set('display_errors', '1');
error_reporting(E_ALL);
$apiKey = '*apikey*';
$filters = '{"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}},"website":{"$blank":false}}';
$params = '?api_key=' . $apiKey . '&filters=' . url_encode($filters);
$url = 'http://api.factual.com/v2/tables/bi0eJZ/read';
$url .= $params;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$output = curl_exec($ch) or die("cURL Error" . curl_error($ch));
curl_close($ch);
echo "out: " . $output;
EDIT:
Another approach could be to use the Official PHP driver for the Factual API:
Official PHP driver for the Factual API
It provides a Debug Mode with a cURL Debug Output and a Exception Debug Output.
Your url is not url_encoded as CURL is an external application escaping is necessary your browser will auto url_encode params on the URL however you could be breaking curl on the server and it is halting.
Try changing this:
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
to:
$url_filters = '{"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}},"website":{"$blank":false}}';
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters=".urlencode($url_filters);
However i do have some question is your call correct? the key of literlal "$loc" is that correct?
Updated to remove the need to backslash everything single quotes don't support variable replace and will allow double quotes without escaping them
For future benefit:
Use urlencode for query parameters as your filters parameter
contains many characters that are not safe/valid for URLs
Use curl_getinfo() to see information about http_code and
other useful information.
In my case the curl failure was due to the fact that the hosting provider had replaced Apache with Litespeed, and litespeed was terminating the process during the curl request.
The php.ini settings didn't fix this as lsphp was getting terminated at 30 seconds every time.
We had a long chat, and I convinced them their server was actually broken (eventually).
To make this clearer for other people who might have a similar or related problem:
My PHP script was running, and no error was being logged in PHP because the entire PHP process, including the error logger was being terminated by the web server.
this code is just for testing, just modify your code according to mine
<?php
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko)
Chrome/8.0.552.224: Safari/534.10'; // notice this
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
$ch = curl_init(); // notice this
curl_setopt($ch, CURLOPT_URL, $url); // notice this
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$contents = curl_exec($ch);
echo $contents;
?>
I have worked with CURL calls from vanilla PHP before. The CURL call is a part of the curl library. As other users have suggested, the url_encode($url) function would prepare the string for use by this particular class. If you do not run this, then you could pass in URLs which would break the handlers (e.g. by having invalid characters or URL syntax).
On top of this, it seems that you are trying to pass a JSON object directly into the URL of a page. I do not believe it is best practice to work with JSON packages in a curl call like this. If this is what you are looking to do, see this page: How do I POST JSON data with cURL?
the fact that the script stop doing anything at all you can found it in here:
http://php.net/manual/en/function.set-time-limit.php
excerpt:
The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running. This is not true on Windows where the measured time is real.
so basically, curl blocks all the php script and basically the only thing that is actually running is curl, so if it blocks forever, your site will no respond, thats why you need to use timeouts...
as how to avoid it, just use timeouts...
I have a simple PHP function on a friend server which I've checked and has PHP CURL enabled.
The function is:
function sw_fetch_code($apikey='',$email=''){
$url = "http://www.domain.com/xxx/api.php?getcode=1&apikey=".$apikey."&email=".$email."";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$result = curl_exec($ch);
curl_close($ch);
$obj = json_decode($result);
if(!empty($obj)){
if($obj->status == 200){
return $obj->code;
}else{
return $obj->status;
}
}
}
As you can see this is very simple and I've tested it and works on localhost and internal to my own server. The url returns as expected. However it just doesn't give any response when this function is called on my friends server.
Any ideas what could cause this?
First :
check from the "friend" server if the URL works, as you donot have POST params, you can check with the exact query and get expected results. See if you can get the results on the browser on the friend server. If you don't have a GUI try wget on the command line. See if you get results. If you do go to next step, if you don't cURL isn't the problem. "friend server" isn't able to see your domain. Could be network issue / hosts etc .. (more on that if its the case)
Second:
If you see results on step 1. Try this and see if you get anything:
$handle = fopen($url, "rb");
$contents = '';
while (!feof($handle)) {
$contents .= fread($handle,1024);
}
If you get response to this, then there is something wrong with cURL.
Does the curl_exec() call fail immediately, or does it hang for 30 seconds or so until it times out? If the latter, you may want to check for a firewall issue.
What does curl_getinfo($ch) tell you?
I think you should begin with standart checks:
If php is compiled with php_curl extension (or the extension is available as shared object). You can check it by putting a
<?php
if (!extension_loaded('curl'))
{
if (!dl('curl.so')) {
die('Cannot load php_curl extension');
}
}
?>
If the extension is loaded there might be a problem with dns/firewall on the friend's server. There also might be a requirement to use proxy server.