I have a Silverstripe site that deals with very big data. I made an API that returns a very large dump, and I call that API at the front-end by ajax get.
When ajax calling the API, it will take 10 mins for data to return (very long json data and customer accepted that).
While they are waiting for the data return, they open the same site in another tab to do other things, but the site is very slow until the previous ajax request is finished.
Is there anything I can do to avoid everything going unresponsive while waiting for big json data?
Here's the code and an explanation of what it does:
I created a method named geteverything that resides on the web server as below, it accessesses another server (data server) to get data via streaming API (sitting in data server). There's a lot of data, and the data server is slow; my customer doesn't mind the request taking long, they mind how slow everything else becomes. Sessions are used to determine particulars of the request.
protected function geteverything($http, $id) {
if(($System = DataObject::get_by_id('ESM_System', $id))) {
if(isset($_GET['AAA']) && isset($_GET['BBB']) && isset($_GET['CCC']) && isset($_GET['DDD'])) {
/**
--some condition check and data format for AAA BBB CCC and DDD goes here
**/
$request = "http://dataserver/streaming?method=xxx";
set_time_limit(120);
$jsonstring = file_get_contents($request);
echo($jsonstring);
}
}
}
How can I fix this, or what else would you need to know in order to help?
The reason it's taking so long is your downloading the entirity of the json to your server THEN sending it all to the user. There's no need to wait for you to get the whole file before you start sending it.
Rather than using file_get_contents make the connection with curl and write the output directly to php://output.
For example, this script will copy http://example.com/ exactly as is:
<?php
// Initialise cURL. You can specify the URL in curl_setopt instead if you prefer
$ch = curl_init("http://example.com/");
// Open a file handler to PHP's output stream
$fp = fopen('php://output', 'w');
// Turn off headers, we don't care about them
curl_setopt($ch, CURLOPT_HEADER, 0);
// Tell curl to write the response to the stream
curl_setopt($ch, CURLOPT_FILE, $fp);
// Make the request
curl_exec($ch);
// close resources
curl_close($ch);
fclose($fp);
Related
I have an XML file localy. It contains data from marketplace.
It roughly looks like this:
<offer id="2113">
<picture>https://anotherserver.com/image1.jpg</picture>
<picture>https://anotherserver.com/image2.jpg</picture>
</offer>
<offer id="2117">
<picture>https://anotherserver.com/image3.jpg</picture>
<picture>https://anotherserver.com/image4.jpg</picture>
</offer>
...
What I want is to save those images in <picture> node localy.
There are about 9,000 offers and about 14,000 images.
When I iterate through them I see that images are being copied from that another server but at some point it gives 504 Gateway Timeout.
Thing is that sometimes error is given after 2,000 images sometimes way more or less.
I tried getting only one image 12,000 times from that server (i.e. only https://anotherserver.com/image3.jpg) but it still gave the same error.
As I've read, than another server is blocking my requests after some quantity.
I tried using PHP sleep(20) after every 100th image but it still gave me the same error (sleep(180) - same). When I tried local image but with full path it didn't gave any errors. Tried second server (non local) the same thing occured.
I use PHP copy() function to move image from that server.
I've just used file_get_contents() for testing purposes but got the same error.
I have
set_time_limit(300000);
ini_set('default_socket_timeout', 300000);
as well but no luck.
Is there any way to do this without chunking requests?
Does this error occur on some one image? Would be great to catch this error or just keep track of the response delay to send another request after some time if this can be done?
Is there any constant time in seconds that I have to wait in order to get those requests rollin'?
And pls give me non-curl answers if possible.
UPDATE
Curl and exec(wget) didn't work as well. They both gone to same error.
Can remote server be tweaked so it doesn't block me? (If it does).
p.s. if I do: echo "<img src = 'https://anotherserver.com/image1.jpg'" /> in loop for all 12,000 images, they show up just fine.
Since you're accessing content on a server you have no control over, only the server administrators know the blocking rules in place.
But you have a few options, as follows:
Run batches of 1000 or so, then sleep for a few hours.
Split the request up between computers that are requesting the information.
Maybe even something as simple as changing the requesting user agent info every 1000 or so images would be good enough to bypass the blocking mechanism.
Or some combination of all of the above.
I would suggest you to try following
1. reuse previously opened connection using CURL
$imageURLs = array('https://anotherserver.com/image1.jpg', 'https://anotherserver.com/image2.jpg', ...);
$notDownloaded = array();
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
foreach ($imageURLs as $URL) {
$filepath = parse_url($URL, PHP_URL_PATH);
$fp = fopen(basename($filepath), "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_URL, $URL);
curl_exec($ch);
fclose($fp);
if (curl_getinfo($ch, CURLINFO_RESPONSE_CODE) == 504) {
$notDownloaded[] = $URL;
}
}
curl_close($ch);
// check to see if $notDownloaded is empty
If images are accessible via both https and http try to use http instead. (this will at least speed up the downloading)
Check response headers when 504 is returned as well as when you load url your browser. Make sure there are no X-RateLimit-* headers. BTW what is the response headers actually?
I am trying to integrate Amazon CloudSearch into SilverStripe. What I want to do is when the pages are published I want a CURL request to send the data about the page as a JSON string to the search cloud.
I am using http://docs.aws.amazon.com/cloudsearch/latest/developerguide/uploading-data.html#uploading-data-api as a reference.
Every time I try to upload it returns me a 403. I have allowed the IP address in the access policies for the search domain as well.
I am using this as a code reference: https://github.com/markwilson/AwsCloudSearchPhp
I think the problem is the AWS does not authenticate correctly. How do I correctly authenticate this?
If you are getting the following error
403 Forbidden, Request forbidden by administrative rules.
and if you are sure you have appropriate rules in effect, I would check the api url you are using. Make sure you are using the correct endpoint. If you are doing batch upload the api endpoint should look like below
your-search-doc-endpoint/2013-01-01/documents/batch
Notice 2013-01-01, that is a required part of the url. That is the api version you will be using. You cannot do the following even though it might make sense
your-search-doc-endpoint/documents/batch <- Won't work
To search you would need to hit the following api
your-search-endpoint/2013-01-01/search?your-search-params
After many searches and trial and error I was able to put together a small code block, from small pieces of code from everywhere to be able to upload a "file" using CURL and PHP to aws cloudsearch.
The one and most important things is to make sure that your data is prepare correctly to be sent in JSON format.
Note: For cloudsearch you're not uploading a file your posting a stream of JSON data. That is why many of us have a problem uploading the data.
So in my case I wanted to be able to upload data that my search engine on clousearch, it seems simple and it is but the lack of example code to do this is not there most people tell you you to go to the documentation which usually has examples but to use the aws CLI. The php SDK is just a learning curb plus instead of making it simple you do 20 steps to do 1 task and not only that you're require to have all these other libraries that are just wrappers for native PHP functions and sometimes instead of making it simple it becomes complicated.
So back to how I did it, first I am pulling the data from my database as an array and serialize it to save it to a file.
$row = $database_data;
foreach ($rows as $key => $row) {
$data['type'] = 'add';
$data['id'] = $row->id;
$data['fields']['title'] = $row->title;
$data['fields']['content'] = $row->content;
$data2[] = $data;
}
// now save your data to a file and make sure
// to serialize() it
$fp = fopen($path_to_file, $mode)
flock($fp, LOCK_EX);
fwrite($fp, serialize($data2));
flock($fp, LOCK_UN);
fclose($fp);
Now that you have your data saved we can play with it
$aws_doc_endpoint = '{Your AWS CloudSearch Document Endpoint URL}';
// Lets read the data
$data = file_get_contents($path_to_file);
// Now lets unserialize() it and encoded in JSON format
$data = json_encode(unserialize($data));
// finally lets use CURL
$ch = curl_init($aws_doc_endpoint);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length: ' . strlen($data)));
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Type: application/json'));
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$response = curl_exec($ch);
curl_close($ch);
$response = json_decode($response);
if ($response->status == 'success')
{
return TRUE;
}
return FALSE;
And like I said there is nothing to it. Most answers that I encounter where, use Guzzle its really easy, well yes it is but for just a simple task like this you don't need it.
Aside from that if you still get an error make sure to check the following.
Well formatted JSON data.
Make sure you have access to the endpoint.
Well I hope someone finds this code helpful.
To diagnose whether it's an access policy issue, have you tried a policy that allows all access to the upload? Something like the following opens it up to everything:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "cloudsearch:*"
}
]
}
I noticed that if you just go to the document upload endpoint in a browser (mine looks like "doc-YOURDOMAIN-RANDOMID.REGION.cloudsearch.amazonaws.com") you'll get the 403 "Request forbidden by administrative rules" error, even with open access, so as #dminer said you'll need to make sure you're posting to the correct full url.
Have you considered using a PHP SDK? Like http://docs.aws.amazon.com/aws-sdk-php/guide/latest/service-cloudsearchdomain.html. It should take care of making correct requests, in which case you could rule out transport errors.
this never worked for me. and i used the Cloudsearch terminal to upload files. and php curl to search files.
Try adding "cloudsearch:document" to CloudSearch's access policy under Actions
I had previously asked a question, and got the answer, but I think I've run into another problem.
The php script I'm using does this:
1 - transfers a file to my server from my backup server
2 - when it's done transfering it sends some post data to it using curl, which creates a zip file
3 - when it's done, the result is echoed and depending on what the result is; transfers the file, or does nothing.
My problem is this:
When the file is small enough (under 500MB) it creates it, and transfers back no problem. When it's larger, it timesout, finishes creating the zip on the remote server, but because it timed out it doesn't get transfered.
I'm running this from a command line on the backup server. I have this in the php script:
set_time_limit(0); // ignore php timeout
ignore_user_abort(true); // keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean(); // remove output buffers
But it still timesout when I run sudo php backup.php
Is using curl making it timeout like a browser on the other end where the zip is being made? I think the problem is the response isn't being echo'd out.
Edits:
(#symcbean)
I'm not seeing anything, which is why I'm struggling. When I run it from the browser, I see the loading thing in the address bar. After about 30 seconds it just stops. When I do it from the command line, same deal. 30 seconds and it just stops. This only happens when large zips need to be created.
It's being invoked via a file. The file loads a class, sends the connection information to the class. Which contacts the server to make the zip, transfers the zip back, does some stuff to it then transfers it to S3 for archiving.
It logs into the remote server, uploads a file with curl. upon a valid response, it curls again with the location of the file as a url (I'll always know what it is), which fires up the php file I just transfered over. The zip ALWAYS gets created no problem, even up to 22GB, just sometimes takes a long time of course. After that it waits for a response of "created". Waiting for that response is where it dies.
So the zip always gets created, but the waiting time is what "I think" is making it die.
Second Edit:
I tried this from the command line:
$ftp_connect= ftp_connect('domain.com');
$ftp_login = ftp_login($ftp_connect,'user','pass');
ftp_pasv($ftp_connect, true);
$upload = ftp_put($ftp_connect, 'filelist.php', 'filelist.php', FTP_ASCII);
$get_remote = 'filelist.php';
$post_data = array (
'last_bu' => '0'
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'domain.com/'.$get_remote);
curl_setopt($ch, CURLOPT_HEADER, 0 );
// adding the post variables to the request
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data);
//echo the following to get response
$response = curl_exec($ch);
curl_close($ch);
echo $response;
and got this:
<HTML>
<HEAD>
<TITLE>500 Internal Server Error</TITLE>
</HEAD><BODY>
<H1>Internal Server Error</H1>
The server encountered an internal error or
misconfiguration and was unable to complete
your request.<P>
Please contact the server administrator to inform of the time the error occurred
and of anything you might have done that may have
caused the error.<P>
More information about this error may be available
in the server error log.<P>
<HR>
<ADDRESS>
Web Server at domain.com
</ADDRESS>
</BODY>
</HTML>
Again, the error log is blank, the zip still gets created, but because of the timeout around 650MB of creation I can't get the response.
The problem is in the server code that generates the file to be returned.
Check the php error log
It may be timing out for a few reasons but the log shouldl tell you why.
I fixed it guys, thank you so much to everyone who helped me, it pointed me in the right directions.
In the end, the problem was on the remote server. What was happening was that it was timing out the cURL connection, which didn't send the result I needed back.
What I did to fix it was add a function to my class that (again) using curl, checks for the zip file http code I know it's creating When it finishes, then throw the result locally. If it's not finished, sleep for a few seconds and check again.
private function watchDog(){
$curl = curl_init($this->host.'/'.$this->grab_file);
//don't fetch the actual page, you only want to check the connection is ok
curl_setopt($curl, CURLOPT_NOBODY, true);
//do request
$result = curl_exec($curl);
//if request did not fail
if ($result !== false) {
//if request was ok, check response code
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 404) {
sleep(7);
self::watchDog();
}
else{
return 'zip created';
}
}
curl_close($curl);
}
From within the HTML code in one of my server pages I need to address a search of a specific item on a database placed in another remote server that I don’t own myself.
Example of the search type that performs my request: http://www.remoteserver.com/items/search.php?search_size=XXL
The remote server provides to me - as client - the response displaying a page with several items that match my search criteria.
I don’t want to have this page displayed. What I want is to collect into a string (or local file) the full contents of the remote server HTML response (the code we have access when we click on ‘View Source’ in my IE browser client).
If I collect that data (it could easily reach reach 50000 bytes) I can then filter the one in which I am interested (substrings) and assemble a new request to the remote server for only one of the specific items in the response provided.
Is there any way through which I can get HTML from the response provided by the remote server with Javascript or PHP, and also avoid the display of the response in the browser itself?
I hope I have not confused your minds …
Thanks for any help you may provide.
As #mario mentioned, there are several different ways to do it.
Using file_get_contents():
$txt = file_get_contents('http://www.example.com/');
echo $txt;
Using php's curl functions:
$url = 'http://www.mysite.com';
$ch = curl_init($url);
// Tell curl_exec to return the text instead of sending it to STDOUT
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// Don't include return header in output
curl_setopt($ch, CURLOPT_HEADER, 0);
$txt = curl_exec($ch);
curl_close($ch);
echo $txt;
curl is probably the most robust option because you have options for more control over the exact request parameters and possibilities for error handling when things don't go as planned
I've got a simple php script to ping some of my domains using file_get_contents(), however I have checked my logs and they are not recording any get requests.
I have
$result = file_get_contents($url);
echo $url. ' pinged ok\n';
where $url for each of the domains is just a simple string of the form http://mydomain.com/, echo verifies this. Manual requests made by myself are showing.
Why would the get requests not be showing in my logs?
Actually I've got it to register the hit when I send $result to the browser. I guess this means the webserver only records browser requests? Is there any way to mimic such in php?
ok tried curl php:
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "getcorporate.co.nr");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
same effect though - no hit registered in logs. So far it only registers when I feed the http response back from my script to the browser. Obviously this will only work for a single request and not a bunch as is the purpose of my script.
If something else is going wrong, what debugging output can I look at?
Edit: D'oh! See comments below accepted answer for explanation of my erroneous thinking.
If the request is actually being made, it would be in the logs.
Your example code could be failing silently.
What happens if you do:
<?PHP
if ($result = file_get_contents($url)){
echo "Success";
}else{
echo "Epic Fail!";
}
If that's failing, you'll want to turn on some error reporting or logging and try to figure out why.
Note: if you're in safe mode, or otherwise have fopen url wrappers disabled, file_get_contents() will not grab a remote page. This is the most likely reason things would be failing (assuming there's not a typo in the contents of $url).
Use curl instead?
That's odd. Maybe there is some caching afoot? Have you tried changing the URL dynamically ($url = $url."?timestamp=".time() for example)?