what is the best solution for fetch json with more than 100 url because the php script is too slow to do that
sure in the head of script I used set_time_limit(0);
I use this little bit code with cURL but it still slowly
$curl_connection = curl_init($jsonurl);
curl_setopt($curl_connection, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($curl_connection, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl_connection, CURLOPT_SSL_VERIFYPEER, false);
$data = json_decode(curl_exec($curl_connection), true);
curl_close($curl_connection);
what do you think about this ?
This is almost impossible to answer without more context, but it sounds like a job for a job queue and a cron job to process the queue periodically.
You can investigate the use of curl_multi_* functionality. This will allow multiple parallel cURL requests.
Here is a simple PHP REST client I built that leverages curl_multi_*. Feel free to use it.
https://github.com/mikecbrant/php-rest-client
Related
guys.
I'm with serious trouble trying to solve this.
The scenario:
Here at work we use the Vulnerability Management tool QualysGuard.
Skipping all technical details, this tool basically detects vulnerabilities in all servers and for each vulnerability in each server it creates a Ticket Number.
From the UI I can access all these tickets and download a CSV file with all of them.
The other way of doing it is by using the API.
The API uses some cURL calls to access the database and retrieve the info that I specify in the parameters.
The method:
I'm using a script like this to get the data:
<?php
$username="myUserName";
$password="myPassword";
$proxy= "myProxy";
$proxyauth = 'myProxyUser:myProxyPassword';
$url="https://qualysapi.qualys.com/msp/ticket_list.php?"; //This is the official script, provided by Qualys, for doing this task.
$postdata = "show_vuln_details=0&SINCE_TICKET_NUMBER=1&CURRENT_STATE=Open&ASSET_GROUPS=All";
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_PROXY, $proxy);
curl_setopt($ch, CURLOPT_PROXYUSERPWD, $proxyauth);
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 0);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . $password);
curl_setopt ($ch, CURLOPT_POSTFIELDS, $postdata);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
$xml = simplexml_load_string($result);
?>
The script above works fine. It connects to the API, pass some parameters to it and the ticket_list.php file generates an XML file with all I need.
The Problems:
1-) This script only allows a limit of 1000 results in the XML file it returns.
If my request has generated more than 1000 results, the script creates a TAG like this, at the end of the XML:
<TRUNCATION last="5066">Truncated after 1000 records</TRUNCATION>
In this case, I would need to execute anoter cURL call, with the parameters bellow:
$postdata = "show_vuln_details=0&SINCE_TICKET_NUMBER=5066&CURRENT_STATE=Open&ASSET_GROUPS=All";
2-) There are approximately 300,000 tickets in Qualys' database (cloud), and I need to download all of them and insert in MY database, which is used by an application that I'm creating. This application has some forms, which are filled by the user and a bunch of queries are run against the database.
The doubt:
What would be the best way for me to do the task above?
I've got some ideas, but I'm at a complete loss.
I thought:
**1-)**Create a function that does the call above, parses the xml and if the tag
TRUNCATION exists, it gets its value and call itself again, doing it recursively until a result without the tag TRUNCATIONcomes.
The problem with this one is that I weren't able to merge the XML results of each call, and I'm not sure if it would cause memory issues, since it would be needed nearly 300 cURL calls. This script would be executed automatically by using the server's cronTab in a non-business period.
2-) Instead of retrieving all the data, I make the forms that I've mentioned post the data to the script and make the cURL calls with the parameters that the user POSTed. But again I'm not sure if that would be good, since I would still need to do multiple calls, depending on the parameters that the user sends.
3-) This is a crazy one: Use some sort of Macro software to record me while I log in the UI, go to the page where the tickets are located, click the download button, check the CSV option and click to download again. Then, export this script to some language like python or java, create a task in the cronTab and create a script that parses the CSV downloaded and inserts the data to the database. (Crazy or not? =P )
Any help is very welcome, maybe the answer is right before my eyes and I haven't gotten yet.
Thanks in advance!
I believe the proper way would involve a queue worker, however, If I were you I'd make your script grab 5 of these XML files in one execution- grab 1, insert rows, remove from memory, repeat. Then, I'd test it by running it a few times manually to see what sort of execution time and memory it requires. Once you've got a good idea of the execution time and you can see memory will not be a problem, schedule a cron for a little under double that time. If all goes well it should be about a minute between runs and you can have it all in your DB within an hour.
So,
I have an issue with needing to make multiple http requests (i mean 100's, even 1000's) using PHP, and these requests need to be run through a proxy to protect the server info/ip.
So, the problem is, to do these requests using php curl is perfectly fine, but takes a long time. If i use php curl_multi, it takes a lot less time. But if i introduce the PROXY to this, the curl_multi takes a significantly longer time than not using curl_multi (curl one after another).
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_URL, $_REQUEST['url']);
//Proxy
$proxyStr = 'user:pass#123.123.123.123:80'; // username:password#ip:port
curl_setopt($ch, CURLOPT_PROXY, $proxyStr);
curl_setopt($ch, CURLOPT_PROXYTYPE, 'HTTP');
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
$req = "something= somethingelse \r\n"; //request that needs to be sent
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $req); // Query
$result = curl_exec($ch);
//This is some sample code, obviously the curl_multi is using curl_multi_init() and curl_multi_exec()
//Here is a sample code for curl_multi, just introduce the proxy code in the middle of it: http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/
My question is, what am i doing wrong? The curl_multi with proxy takes a HUGE amount of time longer regardless of if i use 2 at a time or 5 at a time. But using multi with 1 at a time is quicker (not quick enough but a LOT quicker than 2 or more).
As a poor man's multithreading, i added a worker script to the server which pretty much takes a http post and does a single curl on the data (post contains the proxy, so this worker uses the proxy). And using the main server, i use curl_multi to POST details to the worker script (about 2-5 requests at a time). This method works "better" but not a huge amount better. Eg. if i use curl_multi to send 10 requests 1 at a time to the worker, it finishes in ~10 secs. If i send 10 requests 5 at a time, it finished in ~8-9 Secs. So its definitely faster, but not as fast as i need it to be.
Does anyone have any ideas/suggestions on why this is the case?
Is there something about php curl_multi and proxies that im missing? And I assumed that multiple PHP requests coming INTO a server would execute in parallel (thus why i was reffering to curl_multi'ing into the worker script which does the proxy as a poor mans multithreading). Anyone suggesting on using pthreads, will that actually solve the problem (it is not practical for me to install pthreads, but from what i can see, i doubt it would solve the problem)?
I am new to PHP. I have a block of code on my webpage which I want to execute asynchronously. This block has following :
1. A shell_exec command.
2. A ftp_get_content.
3. Two image resize.
4. One call to mysql for insert.
Is there way make this block async, So that the rest of the page loads quickly.
Please ask if any more details required.
One possible solution is to use curl to do a pseudo async call. You can put the async part of your code in a separate php file and call it via curl. For example:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'YOUR_URL_WITH_ASYNC_CODE');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_TIMEOUT_MS, 1);
curl_exec($ch);
curl_close($ch);
You could put the 4 tasks into a queue, maybe something like Beanstalkd, then have a background worker process this queue.
was searching stackoverflow for a solution, but couldn't find anything even close to what I am trying to achieve. Perhaps I am just blissfully unaware of some magic PHP sauce everyone is doing tackling this problem... ;)
Basically I have an array with give or take a few hundred urls, pointing to different XML files on a remote server. I'm doing some magic file-checking to see if the content of the XML files have changed and if it did, I'll download newer XMLs to my server.
PHP code:
$urls = array(
'http://stackoverflow.com/a-really-nice-file.xml',
'http://stackoverflow.com/another-cool-file2.xml'
);
foreach($urls as $url){
set_time_limit(0);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, false);
$contents = curl_exec($ch);
curl_close($ch);
file_put_contents($filename, $contents);
}
Now, $filename is set somewhere else and gives each xml it's own ID based on my logic.
So far this script is running OK and does what it should, but it does it terribly slow. I know my server can handle a lot more and I suspect my foreach is slowing down the process.
Is there any way I can speed up the foreach? Currently I am thinking to up the file_put_contents in each foreach loop to 10 or 20, basically cutting my execution time 10- or 20-fold, but can't think of how to approach this the best and most performance kind of way. Any help or pointers on how to proceed?
Your bottleneck (most likely) is your curl requests, you can only write to a file after each request is done, there is no way (in a single script) to speed up that process.
I don't know how it all works but you can execute curl requests in parallel: http://php.net/manual/en/function.curl-multi-exec.php.
Maybe you can fetch the data (if memory is available to store it) and then as they complete fill in the data.
Just run more script. Each script will download some urls.
You can get more information about this pattern here: http://en.wikipedia.org/wiki/Thread_pool_pattern
The more script your run the more parallelism you get
I use on paralel requests guzzle pool ;) ( you can send x paralel request)
http://docs.guzzlephp.org/en/stable/quickstart.html
For some reason my curl call is very slow. Here is the code I used.
$postData = "test"
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, false);
$result = curl_exec($ch);
Executing this code takes on average 250ms to finish.
However when I just open the url in a browser, firebug says it only takes about 80ms.
Is there something I am doing wrong? Or is this the overhead associated with PHP Curl.
It's the call to
curl_exec
That is taking up all the time.
UPDATE:
So I figured out right after I posted this that if I set the curl option
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
It significantly slows down
curl_exec
The post data could be anything and it will slow it down.
Even if I set
curl_setopt($ch, CURLOPT_POST, false);
It's slow.
I'll try to work around it by just adding the parameters to the URI as a query string.
SECOND UPDATE:
Confirmed that if I just call the URI using GET and passing parameters
as a query string it is much faster than using POST and putting the parameters in the body.
CURL has some problems with DNS look-ups. Try using IP address instead of domain name.
Curl has the ability to tell exactly how long each piece took and where the slowness is (name lookup, connect, transfer time). Use curl_getinfo (http://www.php.net/manual/en/function.curl-getinfo.php) after you run curl_exec.
If curl is slow, it is generally not the PHP code, it's almost always network related.
try this
curl_setopt($ch, CURLOPT_IPRESOLVE, CURL_IPRESOLVE_V4 );
Adding "curl_setopt($ch, CURLOPT_POSTREDIR, CURL_REDIR_POST_ALL);" solved here. Any problem with this solution?
I just resolved this exact problem by removing the following two options:
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
Somehow on the site I was fetching, the POST request to over ten full seconds. If it's GET, it's less than a second.
So... in my wrapper function that does the Curl requests, it now only sets those two options when there is something in $postData
I just experienced a massive speed-up through compression. By adding the Accept-Encoding header to "gzip, deflate", or just to all formats which Curl supports, my ~200MB download took 6s instead of 20s:
curl_setopt($ch, CURLOPT_ENCODING, '');
Notes:
If an empty string, "", is set, a header containing all supported encoding types is sent.
you do not even have to care about decompression after the download, as this is done by Curl internally.
CURLOPT_ENCODING requires Curl 7.10+
The curl functions in php directly use the curl command line tool under *nix systems.
Therefore it really only depends on the network speed since in general curl itself is much faster than a webbrowser since it (by default) does not load any additional data like included pictures, stylesheets etc. of a website.
It might be possible that you are not aware, that the network performance of the server on which you were testing your php script is way worse than on your local computer where you were testing with the browser. Therefore both measurements are not really comparable.
generally thats acceptable when you are loading contents or posting to slower end of world. curl call are directly proportional to your network speed and throughput of your webserver