curl_exec causes php script to stop doing anything - php

When I run curl on a particular url, the site stops responding and doesn't generate an error, despite my having set error reporting to on. I've tried setting the curl timeouts to low values, and it generates an error then, so I know its not timing out.
The main thing I want to know is, how could that even happen, and how can I figure out why?
The url I'm trying to access is a call to the Factual api, and the url I'm using here
(http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}})
Works when you put it in a browser. The php script works as intended if you change the latitude and longitude to essentially any other values.
error_reporting(E_ALL);
ini_set('display_errors', '2');
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
Echo "\n\n1";
$ch = curl_init($url);
Echo 2;
curl_setopt($ch, CURLOPT_HEADER, 0);
Echo 3;
curl_setopt($ch, CURLOPT_POST, 1);
Echo 4;
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT,30);
Echo 5;
$output = curl_exec($ch) or die("hhtrjrstjsrjt".curl_error($ch));
Echo 6;
curl_close($ch);
Echo "out: ".$output;

It looks like you have some mistakes in your PHP configuration file.
To fix your errors you must edit your php.ini file.
For displaying errors in development mode, change the error_reporting value to E_ALL.
error_reporting=E_ALL
Then you have to enable the cURL extension.
To enable it in your php.ini, yous have to uncomment the following line:
extension=php_curl.dll
Once you edited this values, don't forget to restart your webserver (Apache or Nginx)
Also I agree with my colleagues, you should url_encode your JSON string.
From my point of view the code should be:
<?php
ini_set('display_errors', '1');
error_reporting(E_ALL);
$apiKey = '*apikey*';
$filters = '{"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}},"website":{"$blank":false}}';
$params = '?api_key=' . $apiKey . '&filters=' . url_encode($filters);
$url = 'http://api.factual.com/v2/tables/bi0eJZ/read';
$url .= $params;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 15);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 30);
$output = curl_exec($ch) or die("cURL Error" . curl_error($ch));
curl_close($ch);
echo "out: " . $output;
EDIT:
Another approach could be to use the Official PHP driver for the Factual API:
Official PHP driver for the Factual API
It provides a Debug Mode with a cURL Debug Output and a Exception Debug Output.

Your url is not url_encoded as CURL is an external application escaping is necessary your browser will auto url_encode params on the URL however you could be breaking curl on the server and it is halting.
Try changing this:
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
to:
$url_filters = '{"category":"Automotive","$loc":{"$within":{"$center":[[41,-74],80467.2]}},"website":{"$blank":false}}';
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters=".urlencode($url_filters);
However i do have some question is your call correct? the key of literlal "$loc" is that correct?
Updated to remove the need to backslash everything single quotes don't support variable replace and will allow double quotes without escaping them

For future benefit:
Use urlencode for query parameters as your filters parameter
contains many characters that are not safe/valid for URLs
Use curl_getinfo() to see information about http_code and
other useful information.

In my case the curl failure was due to the fact that the hosting provider had replaced Apache with Litespeed, and litespeed was terminating the process during the curl request.
The php.ini settings didn't fix this as lsphp was getting terminated at 30 seconds every time.
We had a long chat, and I convinced them their server was actually broken (eventually).
To make this clearer for other people who might have a similar or related problem:
My PHP script was running, and no error was being logged in PHP because the entire PHP process, including the error logger was being terminated by the web server.

this code is just for testing, just modify your code according to mine
<?php
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko)
Chrome/8.0.552.224: Safari/534.10'; // notice this
$url="http://api.factual.com/v2/tables/bi0eJZ/read?api_key=*apikey*&filters={\"category\":\"Automotive\",\"\$loc\":{\"\$within\":{\"\$center\":[[41,-74],80467.2]}},\"website\":{\"\$blank\":false}}";
$ch = curl_init(); // notice this
curl_setopt($ch, CURLOPT_URL, $url); // notice this
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$contents = curl_exec($ch);
echo $contents;
?>

I have worked with CURL calls from vanilla PHP before. The CURL call is a part of the curl library. As other users have suggested, the url_encode($url) function would prepare the string for use by this particular class. If you do not run this, then you could pass in URLs which would break the handlers (e.g. by having invalid characters or URL syntax).
On top of this, it seems that you are trying to pass a JSON object directly into the URL of a page. I do not believe it is best practice to work with JSON packages in a curl call like this. If this is what you are looking to do, see this page: How do I POST JSON data with cURL?

the fact that the script stop doing anything at all you can found it in here:
http://php.net/manual/en/function.set-time-limit.php
excerpt:
The set_time_limit() function and the configuration directive max_execution_time only affect the execution time of the script itself. Any time spent on activity that happens outside the execution of the script such as system calls using system(), stream operations, database queries, etc. is not included when determining the maximum time that the script has been running. This is not true on Windows where the measured time is real.
so basically, curl blocks all the php script and basically the only thing that is actually running is curl, so if it blocks forever, your site will no respond, thats why you need to use timeouts...
as how to avoid it, just use timeouts...

Related

php-curl encounters cloudflare "please wait" screen

I had a simple parser for an external site that's required to confirm that the link user submitted leads to an account this user owns (by parsing a link to their profile from linked page). And it worked for a good long while with just this wordpress function:
function fetch_body_url($fetch_link){
$response = wp_remote_get($fetch_link, array('timeout' => 120));
return wp_remote_retrieve_body($response);
}
But then the website changed something in their cloudflare defense, and now this results in "Please wait..." page of cloudflare with no option to pass it.
Thing is, I don't even need it done automatically - if there was a captcha, the user could've complete it. But it won't show anything other than endlessly spinning "checking your browser".
Googled a bunch of curl examples, and best I could get so far is this:
<?php
$url='https://ficbook.net/authors/1000'; //random profile from requrested website
$agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookies.txt');
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_REFERER, 'https://facebook.com/');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
$response = curl_exec($ch);
curl_close($ch);
echo '<textarea>'.$response.'</textarea>';
?>
Yet it still returns the browser check screen. Adding random free proxy to it doesn't seem to work either, or maybe I wasn't lucky finding a working one (or couldn't figure out how to insert it correctly in this case). Is there any way around it? Or perhaps there is some other way to see if there is a specific keyword/link on the page?
Ok, I've spent most of the day on this problem, and seems like I got it more or less sorted. Not exactly the way I expected, but hey, it works... sort of.
Instead of solving this on the server side, I ended up looking for solution to parse it on my own PC (it has better uptime than my hosting's server anyway). Turns out, there are plenty of ready-to-use open source scrapers, including those that know how to bypass cloudflare being extra defensive for no good reason.
Solution for python dummies like myself:
Install Anaconda if you don't have python installed yet.
In cmd type pip install cloudscraper
Open Spyder (it comes along with Anaconda) and paste this:
import cloudscraper
scraper = cloudscraper.create_scraper()
print(scraper.get("https://your-parse-target/").text)
Save it anywhere and poke at run button to test. If it works, you got your data in the console window of same app.
Replace print with whatever you're gonna do with that data.
For my specific case it also required to install mysql-connector-python and to enable remote access for mysql database (and my hosting had it available for free all this time, huh?). So instead of directly verifying that user is the owner of the profile they input, there's now a queue - which isn't perfect, but oh well, they'll have to wait.
First, user request is saved to mysql. My local python script will check that table every now and then to see if anything's in line to be verified. It'll get the page's content and save it back to mysql. Then the old php parser will do its job like before, but from mysql fetch instead of actual website.
Perhaps there are better solutions that don't require resorting to measures like creating a separate local parser, but maybe this will help to someone running into similar issue.

PHP works in browser but not cron

I need to help here because I am going mad! I have a PHP script which as far as I can tell was working when it was made, in both browser and via cron/cli. However this has recently stopped working for some reason, and after investigation it is returning a "Undefined variable" error :(
So, I have viewedme.php, which called checklogin.php, which in turn calls functions.inc.php.
When run directly from the browser the script executes from end to end and performs all the actions it was designed for. However when I run this in cron using "/usr/bin/php /bla/cron_updateviewed.php", it returns this error message...
PHP Notice: Undefined variable: match in /bla/bla/functions.inc.php on line 79
What is even more annoying for me, is I have many other scripts, all of which are run from cron, calling the same function in the same manner, without error.
This is my viewedme.php...
<?php
include_once('checklogin.inc.php');
include_once('mysqli.inc.php');
$ch = curl_init();
// more curl settings here but removed as not important
$html = striplinks($html);
?>
checklogin.inc.php calls functions.inc.php, which includes this function...
// Function to strip down to links
function striplinks($document) {
preg_match_all("'<\s*a\s.*?href\s*=\s* # find <a href=
([\"\'])? # find single or double quote
(?(1) (.*?)\\1 | ([^\s\>]+)) # if quote found, match up to next matching
# quote, otherwise match up to next space
'isx", $document, $links);
while (list($key, $val) = each($links[2])) {
if (!empty($val))
$match[] = $val;
}
while (list($key, $val) = each($links[3])) {
if (!empty($val))
$match[] = $val;
}
return $match;
}
Like I said, Viewed.php is all working when access from the browser, but just not cron. Also important to add there is nothing being passed through the browser like POST or GETs which would stop this from working.
I have other scripts such as checkinbox.php which uses the striplinks function fron cron without any issues.
I am really at a loss as to wtf is going on :(
Updated
Solved "undefined variable" issue, but still returns nill
As suggested the regex wasn't being executed so I have added $match = array(); into the function before the while loops. THis fixes the undefined variable error, but it still returns 0.
After setting up a few test scripts I have found when executing the code though the browser, the cookies in curl are used. However when executing through cron/cli they are not used, therefore a different webpage is being displayed.
Here is my curl code...
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/32.0.1700.107 Chrome/32.0.1700.107 Safari/537.36');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie');
curl_setopt($ch, CURLOPT_URL, 'http://www.bla.com/');
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,10);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
Is anything wrong with it as to why cookies are not used when executing via cron/cli?
My bet is that both your while loops or the ifs they contain never run, and $match stays undefined, causing the error on return.
You can fix this by initializing $match to an empty array like so:
function striplinks($document) {
$match = array(); // also possible for PHP5.4 and later: $match = [];
So after over 4 hours of trial, error, and anger I finally solved the problem. It appears CURL requires the full path to the cookie files if you are going to execute from cli or cron. Strange but there you go :P
curl_setopt($ch, CURLOPT_COOKIEJAR, '/home/full/path/cookie.txt');
Thanks to #Siguza for the array suggestion too.

Getting HTML data from php page

I have a URL like this https://facebook.com/5 , I want to get HTML of that page, just like view source.
I tried using file_get_contents but that didn't returned me correct stuff.
Am I missing something ?
Is there any other function that I can utilize ?
If I can't get HTML of that page, what special thing did the developer do while coding the site to avoid this thing ?
Warning for being off topic
But does this task have be done using PHP?
Since this sounds like a task of web-scraping, I think you would gain more use in casperjs.
With this, you can target with precision what you would want to retrieved from the fb-page rather than grabbing the whole content, which I assume as of this writing is generated by multiple requests of content and rendered to you through a virtual DOM.
Please note that I haven't tried retrieving content from facebook, but I've done this with multiple services.
Good luck!
You may want to use curl instead: http://php.net/manual/en/curl.examples.php
Edit:
Here is an example of mine:
$url = 'https://facebook.com/5';
$ssl = true;
$ch = curl_init();
$timeout = 3;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, $ssl);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
curl_close($ch);
Note that depending on the websites vhost configuration a slash at the end of the url can make a difference.
Edit: Sorry for the undefined variable.. I copied it out of a helper method i used. Now it should be alright.
Yet another Edit:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
By adding this option you will follow the redirects that are apperently happening in your example. Since you said it was an example I actually didnt run it before. Now I did and it works.

Simple GET request with PHP cURL to send SMS text message

I'm creating quick web app that needs to send a php-created message from within php code. cURL is apparently the tool for the job, but I'm having difficulty understanding it enough to get it working.
The documentation for the API I'm dealing with is here. In particular I want to use the simple GET-based sms notification documented here. The latter resource states that the GET API is simply:
http://sms2.cdyne.com/sms.svc/SimpleSMSsend?PhoneNumber={PHONENUMBER}&Message={MESSAGE}&LicenseKey={LICENSEKEY}
And indeed, if I type the following URL into a browser, I get the expected results:
http://sms2.cdyne.com/sms.svc/SimpleSMSsend?PhoneNumber=15362364325&Message=mymessage&LicenseKey=2134234882347139482314987123487
I am now trying to create the same affect within php. Here is my attempt:
<html>
<body>
<?php
$num = '13634859126';
$message = 'some swanky test message';
$ch=curl_init();
curl_setopt($ch, CURLOPT_URL, "http://sms2.cdyne.com/sms.svc/SimpleSMSsend?PhoneNumber=".urlencode($num)."&Message=".urlencode($message)."&LicenseKey=2345987342583745349872");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
?>
</body>
</html>
My other PHP webpages work fine, so I know php and apache are all set up correctly. But When I point my browser at the above page, I get no message on my phone. Can anybody show me what I'm doing wrong?
Note: all numbers are faked... as you might have suspected.
Do you really need CURL? You simply use PHP's file_get_contents($url), which will do a GET request and will return response value.
If there's no return output, probably the cURL fails.
Check the error code of the returned resource to determine the cause of the error.
$result=curl_exec($ch);
$curlerrno = curl_errno($ch);
curl_close($ch);
print $curlerrno;
The error code list: libcurl-errors
I advise to use cURL timeout settings too:
curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
curl_setopt($ch,CURLOPT_TIMEOUT,5);
Assuming you are forming the URL correctly and as one comment says check it manually in a browser I am not sure where your data is going when it comes back so try
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // tell the return not to go to the browser
$output = curl_exec($ch); // point the data to a variable
print "<br />"; // output the variable
print $output;
print "<br />";
Other things to try are
curl_setopt($ch, CURLOPT_INTERFACE, "93.221.161.69"); // telling the remote system where to send the data back
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"); // pretend you are IE/Mozilla in case the remote server expects it
curl_setopt($ch, CURLOPT_POST, 1); // setting as a post
Just replace it
PhoneNumber=$num
curl_setopt($ch, CURLOPT_URL, "http://sms2.cdyne.com/sms.svc/SimpleSMSsend?PhoneNumber=".urlencode($num)."&Message=".urlencode($message)."&LicenseKey=2345987342583745349872");

Call to new HttpRequest fails

I am trying to get a PHP script working. The purpose of the script is to call out to a web service. I've reduced the script down to it's simpliest components and it is still failing. Here it is:
<?php
print "Hello";
$request = new HttpRequest('http://www.pivotaltracker.com/services/v3/source_commits', HttpRequest::METH_POST);
print "Done";
?>
The output is:
D:\svn\svndb\hooks>"c:\Program Files\PHP\php.exe" -f test.php
Hello
D:\svn\svndb\hooks>
As you can see, the script fails when trying to instantiate an instance of HttpRequest. However, no exception is thrown.
I am not a PHP program... I'm just trying to get this feature working. I suspect I have no loaded an extension library that I need... but I can't figure out which one that would be, if indeed that is the problem.
I am running on Windows 2003. I am running PHP 5.3.3.
I did run phpinfo() but am hesitant to post the results here since it is so large. Is there a section of the phpinfo() output that would be helpful to provide?
Put a error_reporting(E_ALL); in front and see what happens.
My bet is that the HTTPRequest class doesn't exist. The HTTP extension is a PECL package that needs to be installed separately.
Thank you everyone for your answers. They were all spot on. I thought I'd summnarize what I did in the end in case it helps someone else.
The problem was indeed that I had not installed the http PECL extension. Unfortunately, I am on windows and there was no distriubtion of this extension and I didn't want to install the microsoft tools on this box to be able to compile the source. So, I went with the suggestion listed above and implemented it using curl.
The script I was working on was to integration svn to http://www.pivotaltracker.com using the excellent php script found at http://phpjack.com/content/pivotal-tracker-and-subversion. I modified that script as follows (in case someone else is in a similar spot):
$request = new HttpRequest('http://www.pivotaltracker.com/services/v3/source_commits', HttpRequest::METH_POST);
$headers = array(
'X-TrackerToken' => $token,
'Content-type' => 'application/xml'
);
$request->setHeaders($headers);
$request->setBody("<source_commit><message>$message</message><author>$author</author><commit_id>$rev</commit_id></source_commit>");
$request->send();
became
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/xml","X-TrackerToken: $token"));
curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
curl_setopt($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
curl_close ($ch);
print $result;
Thanks again for all the excellent and timely advise.
Error reporting by error_reporting( E_ALL );
Enable display errors ini_set('display_errors', 1);
Better to change these settings from php.ini.
If it's not working look at apache logs (error.log)
You could use cURL for that simple purpose:
<?php
$url = "http://www.pivotaltracker.com/services/v3/source_commits";
$ch = curl_init();
// set the target url
curl_setopt($ch, CURLOPT_URL, $url);
// howmany parameter to post
curl_setopt($ch, CURLOPT_POST, 1);
// parameters
curl_setopt($ch, CURLOPT_POSTFIELDS, "someParameter=someValue");
$result = curl_exec ($ch);
curl_close ($ch);
print $result;
?>
Or use fsockopen() to connect to a server and fwrite to send a raw http post request.

Categories