The constructor of the SOAP client in my wsdl web service is awfully slow, I get the response only after the fastcgi_read_timeout value in my Nginx config is reached, it seems as if the remote server is not closing the connection. I also need to set it to a minimum of 15 seconds otherwise I will get no response at all.
I already read similar posts here on SO, especially this one
PHP SoapClient constructor very slow and it's linked threads but I still cannot find the actual cause of the problem.
This is the part which takes 15+ seconds:
$client = new SoapClient("https://apps.correios.com.br/SigepMasterJPA/AtendeClienteService/AtendeCliente?wsdl");
It seems as it is only slow when called from my php script, because the file opens instantly when accessed from one of the following locations:
wget from my server which is running the script
SoapUI or Postman (But I don't know if they cached it before)
opening the URL in a browser
Ports 80 and 443 in the firewall are open. Following the suggestion from another thread, I found two work arounds:
Loading the wsdl from a local file => fast
Enabling the wsdl cache and using the remote URL => fast
But still I'd like to know why it doesn't work with the original URL.
It seems as if the web service does not close the connection, or in other words, I get the response only after reaching the timeout set in my server config. I tried setting keepalive_timeout 15; in my Nginx config, but it does not work.
Is there any SOAP/PHP parameter which forces the server to close the connection?
I was able to reproduce the problem, and found the solution to the issue (works, maybe not the best) in the accepted answer of a question linked in the question you referenced:
PHP: SoapClient constructor is very slow (takes 3 minutes)
As per the answer, you can adjust the HTTP headers using the stream_context option.
$client = new SoapClient("https://apps.correios.com.br/SigepMasterJPA/AtendeClienteService/AtendeCliente?wsdl",array(
'stream_context'=>stream_context_create(
array('http'=>
array(
'protocol_version'=>'1.0',
'header' => 'Connection: Close'
)
)
)
));
More information on the stream_context option is documented at http://php.net/manual/en/soapclient.soapclient.php
I tested this using PHP 5.6.11-1ubuntu3.1 (cli)
Related
we have a php web app using Guzzle 5 to download Wordpress RSS feeds.
It's working fine except for this feed https://www.socialquant.net/blog/feed/
The owner of this site does want us to pull the feed, and is not knowingly attempting to block access.
I can successfully download the file from my local machine and from the production web server (where we initially noticed the problem) using wget or curl with no special options.
This happened once before and that time we believed the issue to be caused by mod_security on Apache and it was solved by adding an arbitrary User-Agent header. But that time I was able to reproduce the issue consistently on the command line, this time it's only failing through Guzzle/PHP
I've copied the response headers from a browser request to the problem feed, and another feed that is working. I crossed off those that were the same and was left with the below
Server:Apache/2.2.22
Vary:User-Agent
X-Powered-By:PHP/5.3.29
Content-Encoding:gzip
Server:Apache
Vary:Accept-Encoding
X-Powered-By:PHP/5.5.30
That's not offering much insight. The gzip content encoding jumps out, I'm trying to find another working feed using gzip to verify this but it shouldn't matter as Guzzle's default mode is to automatically handle encoding. And we're using the same settings to download images from CDNs which are using gzip.
Does anyone have any ideas please? Thanks :)
EDIT
Using Guzzle 5.3.0
Code:
$client = new \GuzzleHttp\Client();
try {
$res = $client->get( $feed, [
'headers' => ['User-Agent' => 'Mozilla/4.0']
] );
} catch (\Exception $e) {
}
I'm afraid I don't have a proper solution to your problem, but I have it working again.
tl;dr version
It's the User-Agent header, changing it to pretty much anything else works.
This wget call fails:
wget -d --header="User-Agent: Mozilla/4.0" https://www.socialquant.net/blog/feed/
but this works
wget -d --header="User-Agent: SomeRandomText" https://www.socialquant.net/blog/feed/
And with that, the PHP below now also works:
require 'vendor/autoload.php';
$client = new \GuzzleHttp\Client();
$feed = 'https://www.socialquant.net/blog/feed/';
try {
$res = $client->get(
$feed,
[
'headers' => [
'User-Agent' => 'SomeRandomText',
]
]
);
echo $res->getBody();
} catch (\Exception $e) {
echo 'Exception: ' . $e->getMessage();
}
My thoughts
I started with wget and curl as you pointed out, which works when no special headers or options are set. Opening it in my browser also worked. I also tried using Guzzle without the User-Agent set and that also works.
Once I set the User-Agent to Mozilla/4.0 or even Mozilla/5.0 it started failing with 406 Not Acceptable
According to the HTTP Status Code definitions, a 406 means
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
In theory, adding Accept and Accept-Encoding headers should resolve the issue, but it didn't. Not via Guzzle or wget.
I then found the Mozilla Developer Network definition which states:
This response is sent when the web server, after performing server-driven content negotiation, doesn't find any content following the criteria given by the user agent.
This kinda points at the User-Agent again. This led me to believe that you are indeed correct that mod_security is doing something odd. I am convinced that an update to mod_security or Apache on the client's servers added a rule to parse the Mozilla/* user agents in a specific way since sending the User-Agent: Mozilla/4.0 () also works.
That's why I'm saying I don't have a proper solution for you. Even though the client wants you to pull the feed, they (or their hosting) is still in control of the rules.
Note: I noticed my IP getting blacklisted after a number of failed 406 attempts, after which I had to wait an hour before I could access the site again. Most likely a mod_security rule. mod_security might even be picking up on the automated requests with your user agent and start blocking it or rejecting it with the 406.
I don't have a solution for you either, as I'm also experiencing this same issue (except I get error 503 and it fails 60% of the time). Let me know if you have found a solution.
However, I would like to share with you what I have found through my recent research. I found that certain User-Agents work better than others for me. This makes me believe that it's not what Donovan states to be the case (at least for me).
When I set User-Agent to null, it works 100% of the time. However, I haven't made any large requests yet, as I'm afraid of getting IP banned, as I know I would with a large request.
When I do a var_dump of the request itself, I see a lot of arrays which include Guzzle markers. I'm thinking, maybe Amazons detection services can tell that I'm spoofing the headers? I don't know.
Hope you figured it out.
The URL in question : http://www.roblox.com/asset/?id=149996624
When accessed in a browser, it will correctly download a file (which is an XML document). I wanted to get the file in php, and simply display its contents on a page.
$contents = file_get_contents("http://www.roblox.com/asset/?id=149996624");
The above is what I've tried using (as far as I know, the page does not expect any headers). I get a 500 HTTP error. However, in Python, the following code works and I receive the file.
r = requests.get("http://www.roblox.com/asset/?id=147781188")
I'm confused as to what the distinction is between how these two requests are sent. I am almost 100% it is not a header problem. I've also tried the cURL library in PHP to no avail. Nothing I've tried in PHP seems to succeed with the URL (with any valid id parameter); but Python is able to bring success nonchalantly.
Any insight as to why this issue may be happening would be great.
EDIT : I have already tried copying Python's headers into my PHP request.
EDIT2 : It also appears that there are two requests happening upon navigating to the link.
Is this on a linux/mac host by chance? If so you could use ngrep to see the differences on the request themselves on the wire. Something like the following should work
ngrep -t '^(GET) ' 'src host 127.0.0.1 and tcp and dst port 80'
EDIT - The problem is that your server is responding with a 302 and the PHP library is not following it automatically. Cheers!
I am having following problem:
I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
In the Server I am creating .xml files with around 100kb sizes and they will be created around 100+.
Now main problem is browser shows Response Time out and IE at the below (just upper status bar) shows .php file download message.
During this in the backend (Server side) process is still running and continuously creating .xml files in incremental order. So no issue with that.
I have following php.ini configuration.
max_execution_time = 10000 ; Maximum execution time of each script, in seconds
max_input_time = 10000 ; Maximum amount of time each script may spend parsing request data
memory_limit = 2000M ; Maximum amount of memory a script may consume (128MB)
; Maximum allowed size for uploaded files.
upload_max_filesize = 2000M
I am running my site on IE. And I am using ZSCE with PHP 5.3
Can anybody redirect me on proper way on this issue?
Edit:
Uploading image of Time out and that's why asking for .php file download.
Edit 2:
I briefly explain my execution flow:
I have one PHP file with objects of Class Hierarchies which will start to execute Function1() from each class Hierarchy.
I have class file.
First, let say, Function1() is executed which contains logic of creating XML files in chunks.
Second, let say, Function2() is executed which will display output generated by Function1().
All is done in Class Hierarchies manner. So I can't terminate, in between, execution of Function1() until it get executed. And after that Function2() will be called.
Edit 3:
This is specially for #hakre.
As you asked some cross questions and I agree with some points but let me describe more in detail about the issue.
First I was loading around 100+ MB size XML Files at a time and that's why my Memory in local setup was hanging and stops everything on Machine and CPU time was utilizing its most resources.
I, then, divided this big size XML files in to small size (means now I am loading single XML file at a time and then unloading it after its usage). This saved me from Memory overload and CPU issue on local setup.
Now my backend process is running no CPU or Memory issue but issue is with Browser Timeout. I even tried cURL but as per my current structure it does seems to fit because of my class hierarchy issue. I have a set of classes in hierarchy and they all execute first their Process functions and then they all execute their Output functions. So unless and until Process functions get executed the Output functions do not comes in picture and that's why Browser shows Timeout.
I even followed instructions suggested by #vortex and got little success but not what I am looking for. Why I could not implement cURl because My process function is Creating required XML files at one go so it's taking too much time to output to Browser. As Process function is taking that much time no output is possible to assign to client unless and until it get completed.
cURL Output:
URL....: myurl
Code...: 200 (0 redirect(s) in 0 secs)
Content: text/html Size: -1 (Own: 433) Filetime: -1
Time...: 60.437 Start # 60.437 (DNS: 0 Connect: 0.016 Request: 0.016)
Speed..: Down: 7 (avg.) Up: 0 (avg.)
Curl...: v7.20.0
Contents of test.txt file
* About to connect() to mylocalhost port 80 (#0)
* Trying 127.0.0.1... * connected
* Connected to mylocalhost (127.0.0.1) port 80 (#0)
\> GET myurl HTTP/1.1
Host: mylocalhost
Accept: */*
< HTTP/1.1 200 OK
< Date: Tue, 06 Aug 2013 10:01:36 GMT
< Server: Apache/2.2.21 (Win32) mod_ssl/2.2.21 OpenSSL/0.9.8o
< X-Powered-By: PHP/5.3.9-ZS5.6.0 ZendServer
< Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/
< Cache-Control: private
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Connection #0 to host mylocalhost left intact
* Closing connection #0
Disclaimer : An answer for this question is chosen based on the first little success based on answer selected. The solution from #Hakre is also feasible when this type of question is occurred. But right now no answer fixed my question but little bit. Hakre's answer is also more detail in case of person finding for more details about this type of issues.
assuming you made all the server side modifications so you dodge a server timeout [i saw pretty much everyting explained above], in order to dodge browser timeout it is crucial that you do something like this
<?php
set_time_limit(0);
error_reporting(E_ALL);
ob_implicit_flush(TRUE);
ob_end_flush();
I can tell you from experience that internet explorer doesn't have any issues as long as you output some content to it every now and then. I run a 30gb database update everyday [that takes around 2-4 hours] and opera seems to be the only browser that ignores the content output.
if you don't set "ob_implicit_flush" you need to do an "ob_flush()" after every piece of content.
References
ob_implicit_flush
ob_flush
if you don't use ob_implicit_flush at the top of your script as I wrote earlier, you need to do something like:
<?php
echo 'dummy text or execution stats';
ob_flush();
within your execution loop
1. I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
Now that's a wild guess. How did you find out it was a CPU time out issue in the first place? Did you even? If yes, what does your test now gives? If not, how do you test now that this is not a time-out issue?
Despite you state there won't be a certain issue, you don't proof that and many questions are still open. That invites for guessing which is counter-productive for trouble-shooting (which you are doing here).
What you write here just means that you wrote code to chunk memory, however, this is not a test for CPU time out issues. The one is writing code the other part is test. Don't mix the two. And don't draw wild assumptions. Issues are for the test, otherwise it didn't happen.
So much for your first point already just to show you that when doing troubleshooting, look for facts (monitor, test, profile, step-debug) not run assumptions. This is curcial otherwise you look in the wrong places and ask the wrong questions.
From what you describe how the client (browser) behaves, this is not a time-out-issue per-se. The problem you've got is that the answer between the header response and the body response is taking to long for the taste of your browser. The one browser is assuming a time-out (as such a boundary value has been triggered and this looks more correct to me) and the other browser is assuming somthing is coming up, why not save it.
So you merely have a processing issue here. Please consult the menual of your internet browsers (HTTP clients) which configuration values you can change to change this behavior. E.g. monitor with a curl-request on the command-line how long the request actually take. Then configure your browser to not time-out when connecting to that server under such an amount of time you just measured. For example if you're using Internet Explorer: http://www.ehow.com/how_6186601_change-internet-timeout-options.html or if you're using Mozilla Firefox: http://forums.mozillazine.org/viewtopic.php?f=7&t=102322&start=0
As you didn't show any code on the server-side I assume you want to solve this problem with client settings. Curl will help you to measure the number of seconds such a request takes. Use the -v (Verbose) switch to obtain detailed information about the request.
In case you don't want to solve this on the client, curl will still help you to measure important data and easily reproduce any underlying server-related timing issue. So you should go for Curl on the command-line in any case, especially as looking into response-headers might reveal what triggers the (again) esoteric internet explorer behavior. Again the -v switch does reveal you request and response headers.
If you like to automate such tests with a PHP script, it's also possible with the PHP Curl Extension. This has been outlined in:
Php - Debugging Curl
The problem is with your web-server, not the browser.
If you're using Apache, you need to adjust your Timeout value at httpd.conf or virtual hosts config.
You have 3 pages
Process - Creates the XML files and then updates a database value saying that the process is done
A PHP page that returns {true} or {false} based on the status of the process completion database value
An ajax front end, polling page 2 every few seconds to check weather the process is done or not
Long Polling
I have had this issue several times, while reading large csv file and puting it in database. I solved it in way, that i divided the reading and putting in database process into smaller parts. Like i created a new table to make log of how much data is readed and inserted, and next time the page reloads itself and start from that position. So you can do it by creating one xml in one attempt,and reload page and start form next one. In this way the memory used by browser is refreshed.
Hope it will help.
Is it possible to send some output to browser from the script while it's still processing, even white space? If, then do it, it should reset the timeout counter.
If it's not possible, you have to increase the timeout of IE in the registry:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings
You need ReceiveTimeout, if it's not there, create it as dword, and set the value in miliseconds.
What is a "CPU time out issue"?
The right way to solve the problem is to run the heavy stuff asynchronously, in a seperate session group (not the webserver process tree).
Try to include set_time_limit(0); in your PHP script page.
The following links might help you.
http://php.net/manual/en/function.set-time-limit.php
http://php.net/manual/en/function.ignore-user-abort.php
I am hesitated to ask this question because it looks weird.
But anyway.
Just in case someone had encountered the same problem already...
filesystem functions (fopem, file, file_get_contents) behave very strange for http:// wrapper
it seemingly works. no errors raised. fopen() returns resource.
it returns no data for all certainly working urls (e.g. http://google.com/).
file returns empty array, file_get_contents() returns empty string, fread returns false
for all intentionally wrong urls (e.g. http://goog973jd23le.com/) it behaves exactly the same, save for little [supposedly domain lookup] timeout, after which I get no error (while should!) but empty string.
url_fopen_wrapper is turned on
curl (both command line and php versions) works fine, all other utilities and applications works fine, local files opened fine
This error seems inapplicable because in my case it doesn't work for every url or host.
php-fpm 5.2.11
Linux version 2.6.35.6-48.fc14.i686 (mockbuild#x86-18.phx2.fedoraproject.org)
I fixed this issue on my server (running PHP 5.3.3 on Fedora 14) by removing the --with-curlwrapper from the PHP configuration and rebuilding it.
Sounds like a bug. But just for posterity, here are a few things you might want to debug.
allow_url_fopen: already tested
PHP under Apache might behave differently than PHP-CLI, and would hint at chroot/selinux/fastcgi/etc. security restrictions
local firewall: unlikely since curl works
user-agent blocking: this is quite common actually, websites block crawlers and unknown clients
transparent proxy from your ISP, which either mangles or blocks (PHP user-agent or non-user-agent could be interpreted as malware)
PHP stream wrapper problems
Anyway, first let's proof that PHPs stream handlers are functional:
<?php
if (!file_get_contents("data:,ok")) {
die("Houston, we have a stream wrapper problem.");
}
Then try to see if PHP makes real HTTP requests at all. First open netcat on the console:
nc -l 80000
And debug with just:
<?php
print file_get_contents("http://localhost:8000/hello");
And from here you can try to communicate with PHP, see if anything returns if you variate the response. Enter an invalid response first into netcat. If there's no error thrown, your PHP package is borked.
(You might also try communicating over a "tcp://.." handle then.)
Next up is experimenting with http stream wrapper parameters. Use http://example.com/ literally, which is known to work and never block user-agents.
$context = stream_context_create(array("http"=>array(
"method" => "GET",
"header" => "Accept: xml/*, text/*, */*\r\n",
"ignore_errors" => false,
"timeout" => 50,
));
print file_get_contents("http://www.example.com/", false, $context, 0, 1000);
I think ignore_errors is very relevant here. But check out http://www.php.net/manual/en/context.http.php and specifically try to set protocol_version to 1.1 (will get chunked and misinterpreted response, but at least we'll see if anything returns).
If even this remains unsuccessful, then try to hack the http wrapper.
<?php
ini_set("user_agent" , "Mozilla/3.0\r\nAccept: */*\r\nX-Padding: Foo");
This will not only set the User-Agent, but inject extra headers. If there is a processing issue with construction the request within the http stream wrapper, then this could very eventually catch it.
Otherwise try to disable any Zend extensions, Suhosin, PHP xdebug, APC and other core modules. There could be interferences. Else this is potentiallyan issue specific to the Fedora package. Try a new version, see if it persists on your system.
When you use the http stream wrapper PHP creates an array for you called $http_response_header after file_get_contents() (or any of the other f family of functions) is called. This contains useful info on the state of the response. Could you do a var_dump() of this array and see if it gives you any more info on the response?
It's a really weird error that you're getting. The only thing I can think of is that something else on the server is blocking the http requests from PHP, but then I can't see why cURL would still be ok...
Is http stream registered in your PHP installation? Look for "Registered PHP Streams" in your phpinfo() output. Mine says "https, ftps, compress.zlib, compress.bzip2, php, file, glob, data, http, ftp, phar, zip".
If there is no http, set allow_url_fopen to on in your php.ini.
My problem was solved dealing with the SSL:
$arrContextOptions = array(
"ssl" => array(
"verify_peer" => false,
"verify_peer_name" => false,
),
);
$context = stream_context_create($arrContextOptions);
$jsonContent = file_get_contents("https://www.yoursite.com", false, $context);
What does a test with fsockopen tell you?
Is the test isolated from other code?
I had the same issue in Windows after installing XAMPP 1.7.7. Eventually I managed to solve it by adding the following line to php.ini (while having allow_url_fopen = On):
extension=php_openssl.dll
Use http://pear.php.net/reference/PHP_Compat-latest/__filesource/fsource_PHP_Compat__PHP_Compat-1.6.0a2CompatFunctionfile_get_contents.php.html and rename it and test if the error occurs with this rewritten function.
any idea why fopen would timeout for a file if it is on my server and I know the url is correct?
update: sorry, i should have mentioned this is in php.
the code is:
fopen($url, 'r');
It works if i put in a relative path for the file, but not if $url is a url in my server (but it works for google.com). Thanks for the help.
Alaitnik's answer was right. The problem only appears when i access my own server files through the ethernet interface. How can I fix this? I need to be able to access the file from the ethernet interface because the url loads dynamically (it's generated from a wordpress cms, so the url doesn't technically exist as a file on my server)
you can use
ini_set('default_socket_timeout',2);
before opening the fopen $url . This actually set the default socket connection timout without responding.
Stream_set_timeout sets time out on the stream that is established via fopn or socket opening functions.
Try this may be helpful for you.
It appears that you're trying to download a file from your own server using the HTTP protocol from a program running on that same server?
If so, the timeout problem is likely to be web server or network configuration related. Timeouts normally only happen because either:
the server really is taking a long time to send back the answer, or
the TCP connection is being blocked
For example, it may be that your local firewall rules only permit access to www.example.com if those queries come from the ethernet interface, but a locally made connection would try to go via the loopback interface.
maybe your "allow_url_fopen" is set to "Off"
check your php.ini file or phpinfo()
If you are trying to get the HTML of a URL, I suggest using curl instead of fopen.
fopen is best used with local files, coz it does not "know" how to deal with the idiosyncrasies of a network resource.
Check the comments on the documentation of fopen. There's a whole lot of gold in there.
Took me ages to solve this, but here I found it, thanks to Alnitak. Opening the file with localhost in the URL instead of the hostname was what did the trick for me.