I'm having a curiosity issue and don't seem to find the correct phrases for expressing what I mean, for a successful Google search query.
Some sites (that mostly do price queries) do an ajax query to something (let's assume it's php script) with user set criteria and the data doesn't get displayed all at once when the query is finished, but you see some parts being displayed from the response earlier (as I assume they become available earlier) and some later.
I'd imagine the ajax request is done to a php script which in turn queries different sources and returns data as soon as possible, meaning quicker query responses get sent first.
Core question:
How would such mechanism be built that php script can return data
multiple times and ajax script doesn't just wait for A response?
I'm rather sure there's information about this available, but unfortunately have not been able to find out even by what terms to search for it.
EDIT:
I though of a good example being cheap flight ticket booking services, which query different sources and seem to output data as soon as it's available, meaning different offers from different airlines appear at different times.
Hope someone can relieve my curiosity.
Best,
Alari
On client side you need onprogress. See the following example (copied from this answer):
var xhr = new XMLHttpRequest()
xhr.open("GET", "/test/chunked", true)
xhr.onprogress = function () {
console.log("PROGRESS:", xhr.responseText)
}
xhr.send()
xhr.responseText will keep accumulating the response given by the server. The downside here is that xhr.responseText contains an accumulated response. You can use substring on it for getting only the current response.
On the server side, you could do output buffering to chunk the response, e.g. like:
<?php
header( 'Content-type: text/html; charset=utf-8' );
for($i = 0; $i < 100; $i++){
echo "Current Response is {$i} \r\n";
flush();
ob_flush();
// sleep for 2 seconds
sleep(2);
}
Related
I have a php script making requests to some web site. I run this script from command line so no web server on my side is involved. Just pure PHP and a shell.
The response is split into pages so I need to make multiple requests to gain all the data with one script run. Obviously, the request's URL is identical except one parameter. Nothing complicated:
$base_url = '...';
$pages = ...; // a number I receive elsewhere
$delay = ...; // a delay to avoid too many requests
$p = 0;
while ($p < $pages) {
$url = $base_url . "&some_param=$p";
... // Here cURL takes it's turn because of cookies
sleep($delay);
}
The pages I get this way look all the same - like the first one that was requested. (So I get just a repetitive list multiplied by the number of pages.)
I decided that it happens because of some caching on the web server's end which persists despite of an additional random parameter I pass. Closing and reinitializing cURL session doesn't help as well.
I also noticed that if I quickly fix the initial $p value manually (so requests start from different page) and then launch the script again, the result changes. I do it quicker than $delay value.
It means that two different requests made from the same script run give same result, while two different requests made from two different script runs give different results, regardless of delay between the requests. So it can't be just caching on the responded side.
I tried to work that around and wrapped the actual request in a separate script which I run using exec() from the main script. So there is (should be, I consider) a separate shell instance for any single page request, and those requests should not share any kind of cache between them.
Despite of that, I keep getting the same page again. The code looks something like that:
$pages = ...;
$delay = ...;
$p = 0;
$command_stub = 'php get_single_page.php';
while ($p < $pages) {
$command = $command_stub . " $p";
exec($command, $response);
// $response is the same again for different $p's
sleep($delay);
}
If I again change the starting page manually in the script, I get a result for that page all over again. Until I change it once more. And so on. Several minutes may pass between two runs of the main script, and it still yields identical result until I switch the number by hand.
I can't comprehend why this is happening. Can somebody explain it?
The short answer is no. Curl certainly doesn't retain anything between executions unless configured to do so (e.g.: setting a cookie file).
I suspect the server is expecting a session token of some sort (cookie or other HTTP header are my guess). Without the session token it will just ignore the request for subsequent pages.
I have a Wordpress website with a working order system. Now I want to make an Android app which displays every new order in a list view as soon as the order was made.
The last two days I thought about the following solutions:
Simple HTTP GET requests every 10 seconds
Websockets
MySQL binary log + Pusher Link
Server Sent Events
My thoughts (working with a LAMP stack):
Simple HTTP requests are obviously the most ineffective solution.
I figured out that websockets and Apache aren't working well together.
Feels quite hacky and I want to avoid any 3rd party service if I can.
4. Looks like this is the optimal way for me, however there are some problems with Apache/php and Server Sent Events from what I experienced.
I tried to implement a simple demo script but I don't understand why some of them are using an infinite while loop to keep the connection open and others don't.
Here is an example without a loop and here with an infinite loop, also here
In addition to that, when I tested the variant with the infinite loop, my whole page won't load because of that sleep() function. It looks like the whole server freezes whenever I use it.
Does anyone have an idea how to fix that? Or do you have other suggestions?
That is the code that causes trouble (copied from here) and added a missing curly bracket:
<?php
// make session read-only
session_start();
session_write_close();
// disable default disconnect checks
ignore_user_abort(true);
// set headers for stream
header("Content-Type: text/event-stream");
header("Cache-Control: no-cache");
header("Access-Control-Allow-Origin: *");
// Is this a new stream or an existing one?
$lastEventId = floatval(isset($_SERVER["HTTP_LAST_EVENT_ID"]) ? $_SERVER["HTTP_LAST_EVENT_ID"] : 0);
if ($lastEventId == 0) {
$lastEventId = floatval(isset($_GET["lastEventId"]) ? $_GET["lastEventId"] : 0);
}
echo ":" . str_repeat(" ", 2048) . "\n"; // 2 kB padding for IE
echo "retry: 2000\n";
// start stream
while(true){
if(connection_aborted()){
exit();
}
else{
// here you will want to get the latest event id you have created on the server, but for now we will increment and force an update
$latestEventId = $lastEventId+1;
if($lastEventId < $latestEventId){
echo "id: " . $latestEventId . "\n";
echo "data: Howdy (".$latestEventId.") \n\n";
$lastEventId = $latestEventId;
ob_flush();
flush();
}
else{
// no new data to send
echo ": heartbeat\n\n";
ob_flush();
flush();
}
}
// 2 second sleep then carry on
sleep(2);
}
?>
I'm thankful for every advice I can get! :)
EDIT:
The main idea is to frequently check my MySQL database for new entries and if there is a new order present, format the data nicely and send the information over SSE to my android application.
I already found libraries to receive SSEs on android, the main problem is on the server side.
Based on your question I think you could implement SSE - Server sent events, which is part of HTML5 standard. It is a one-way communication from server to client. It needs html/javascript and a backend language, e.g PHP.
The client will subscribe on events and when subscription is up and running the server will send any updates from the input data. As standard the update will be visible each 3 seconds. This can be adjusted though.
I would recommend you to first create a basic functioning web-browser-client as a start. When and if it is working as you expect, only then you would judge about the effort of building the client as an app.
You would probably need to add functions on the client-side, such as start/stop the subscription.
My understanding of users not recommending the combination of (server sent events) and Apache is the lack of control how many open connections there are and what would control the continuously need of closing of connections. This could lead to sever server performance problems.
Seems using for example node.js would not cause that problem.
Here are some start link:
MDN:
https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events
Stream Updates with Server-Sent Events:
https://www.html5rocks.com/en/tutorials/eventsource/basics/
I've got a simple PHP script that, once ran, makes impossible for me to access any other page on the server.
The script is as simple as this:
for($league=11387; $league<=11407; $league++){
for($i=1; $i<9; $i++){
//gets the team object here from external resource
$team = $HT->getYouthTeam($HT->getTeam($HT->getLeague($league)->getTeam($i)->getTeamId())->getYouthTeamId());
if($team->getId() != 2286094){
$youthTeams[] = $team;
}
set_time_limit(10);
}
}
Obviously, I am supposed to get thousands of "teams" here (except one with the ID of 2286094), but once I run this script I cannot open any other page on the server until this is over and it takes lots of time until the script fetches the results into $youthTeams array.
My intent was to make a progress bar that would tell exactly (in %) where the script is at, but I can't since this script makes impossible for the server to display any other pages (you get any other page "loading" but it never loads because of this script being ran on the server).
Also, addition sub-question: once all of this data is fetched, would it be smart to insert it all into the mysql database in one single query?
I really wanna learn more on this and want to get this finished so please help me out on this one.
Maybe you can identify which one of your lookups eats the most time by checking on the times?
$t0=microtime(1);
$teamid=$HT->getLeague($league)->getTeam($i)->getTeamId();
echo "lookup teamid: ".(($t1=microtime(1))-$t0)."<br>";
if (if($team->getId() != 2286094) {
$youthteamid=$HT->getTeam($teamid)->getYouthTeamId();
echo "lookup youthteamid: ".(($t2=microtime(1))-$t1)."<br>";
$youthteam = $HT->getYouthTeam($youthteamid);
echo "lookup youthteam: ".(($t3=microtime(1))-$t2)."<br>total time: ".($t3-$t0)."<br>";
}
I need a random string that refreshes every two seconds, so that you'll get an effect of a word that is mixed every two seconds. This is my code:
function rand_string( $length ) {
$chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
$size = strlen( $chars );
for( $i = 0; $i < $length; $i++ ) {
$str .= $chars[ rand( 0, $size - 1 ) ];
}
return $str;
Then I want to repeat this a number of times, not unlimited, so I used this piece of code:
$random = rand_string( 9 );
for($i=0; $i < 5; $i++) {
echo $random;
flush();
sleep(2);
}
Somehow the page waits 10 seconds and then shows my string, and my string shows up five times the same, and not one time refreshing every two seconds. Could you please help me?
Try something like the example below. Substitute a URL that points at a page/service on your own server which returns the next string value (and nothing else). Be sure to set the content type in your response (from the server) to "text/plain".
As stated/indicated/hinted in other posts, the issue is that HTTP is a stateless protocol. The browser sends a request. The server sends a response. The end. :-) The PHP code executes exclusively on the server, where its job is only generating content for the web browser. But it does not interact with the browser in any way beyond that. Once all of the content generated by the PHP code is emitted to the browser, the PHP code is finished. You should read up a bit on concepts like output buffering. You can exercise a little bit of control over whether your PHP code buffers up all the output then sends it to the browser all-at-once, or trickles it out as it generates it. But you simply cannot use PHP code to interactively change anything on the web page. Once you send it to the browser, it's sent and that's it. You can't call it back and change it.
Now, having said that, you certainly can use PHP code to emit JavaScript code, which can then interact with the DOM in the browser, and also make AJAX calls back to different resources on the server, which can in turn be different PHP pages that do whatever you need them to and return results for display or for further processing in the browser (which could lead to additional AJAX calls to the server, although "chatty" does not generally equal "good").
AJAX (Asynchronous JavaScript and XML) is a technology that lets you make calls back to the web server from your web page without reloading the entire page. You can use JavaScript timer functions like setInterval() and setTimeout() to implement delays or to create recurring events, like your text update. Don't get too hung up on the "XML" in "AJAX." A newer data encapsulation standard called JSON has become very popular and is at least as usable via AJAX as XML is in virtually all cases. JSON is "JavaScript Object Notation," and the standard is basically just serialized JavaScript data structures, very natural to work with.
In fact, in the example I show below, neither XML nor JSON is utilized (in the interest of simplicity). But either XML or JSON could have easily been used and probably should be in a serious service implementation.
The XMLHttpRequest object is the magic bit that makes AJAX possible. XMLHttpRequest, setInterval(), setTimeout() and tons of other APIs utilize asynchronous callbacks. So that is another concept you will need to embrace. An asynchronous callback is just a function that you pass to, for example, setInterval() so that it will be able to "call you back" when the timer event occurs (you pass a reference to the function). In the meantime, your interface isn't locked up waiting for the callback. Thus it is asynchronous. My example below also uses inline (unnamed, anonymous) functions called closures, which is another concept that is very important for modern JavaScript programming.
Finally, I would heartily recommend using something like jQuery. Well, I'd recommend jQuery. There are other JavaScript frameworks, but I'm not entirely sure there is much point in looking at any of the others any more. The example below does not use jQuery.
The main thing you are accomplishing with your original example, since PHP executes exclusively on the server, is to make your page take longer to completely finish rendering, which means it takes longer for your request to disconnect from the web server, which in turn is tying up a connection resource on the server that no other browser instances can utilize until the request finishes at least 10 seconds after it starts.
<html>
<head>
</head>
<body>
<div id="blah">Watch me change.</div>
<script language="javascript">
// set callback function, to be called every 2 seconds
setInterval( function() {
if (window.XMLHttpRequest) {
xmlhttp = new XMLHttpRequest();
}
else { // IE6, IE5
xmlhttp = new ActiveXObject( "Microsoft.XMLHTTP" );
}
// callback function called for each state change, "4" means request finished.
xmlhttp.onReadyStateChange = function() {
if( 4 == xmlhttp.readyState && 200 == xmlHttp.status ) {
document.getElementById("blah").innerHTML = xmlHttp.responseText;
}
}
xmlhttp.open( "GET", "http://source-of-new-string...", true );
xmlhttp.send();
}, 2000 );
</script>
</body>
</html>
You do not understand the fundamentals of web development yet. The single PHP script you wrote first generates a string, then loops 5 times, outputting the generated string and then sleeping for 2 seconds, and when the script is finished it is flushed to the browser. As such it's obvious that, since you only called the function once, you see 5 identical strings after 10 seconds.
Having a new string appear every 2 seconds is just not possible with the stateless thing that is an 'HTTP request'. You would need to use Ajax-callbacks to achieve that effect, invoked from the client side every 2 seconds.
header('Content-Type: text/html; charset=UTF-8');
function rand_string( $length ) {
$chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
$size = strlen( $chars );
for( $i = 0; $i < $length; $i++ ) {
$str .= $chars[ rand( 0, $size - 1 ) ];
}
return $str;
}
for($i=0; $i < 5; $i++) {
$random = rand_string(9);
echo $random;
flush();
ob_flush();
sleep(2);
}
You need to define the variable $random new within the loop.
Somehow the page waits 10 seconds and then shows my string
First, PHP can't do what you wishes. PHP sends the web page content to the browser which in turn displays it.
So the browser may choose the amount of data that it needs to receive before even starting to display the page.
There is your delay: five times sleep(2);. The browser waits the end of the connection before displaying data.
My string shows up five times the same
Secondly, the code below only prints five times the content of the variable $random, which is never changed in the for loop. So five times the same content.
$random = rand_string( 9 );
for($i=0; $i < 5; $i++) {
echo $random;
flush();
sleep(2);
}
There is your repetition: five times echo $random;.
Notes
$random is a variable. For the rand() function, see here.
For random strings, see:
PHP random string generator
PHP may not be suited for what you want to do I need a random string
that refreshes every two seconds. The refreshing part could be obtained through javascript or at worst, by refreshing the page every two seconds and by displaying a new string along with the old ones. This will leave the illusion that a new string is added (with the ugly page loading moment in-between).
I have to scrap a web site where i need to fetch multiple URLs and then process them one by one. The current process somewhat goes like this.
I fetch a base URL and get all secondary URLs from this page, then for each secondary url I fetch that URL, process found page, download some photos (which takes quite a long time) and store this data to database, then fetch next URL and repeat the process.
In this process, I think I am wasting some time in fetching secondary URL at the start of each iteration. So I am trying to fetch next URLs in parallel while processing first iteration.
The solution in my mind is, from main process call a PHP script, say downloader, which will download all the URL (with curl_multi or wget) and store them in some database.
My questions are
How to call such downloder asynchronously, I don't want my main script to wait till downloder completes.
Any location to store downloaded data, such as shared memory. Of course, other than database.
There any chances that data gets corrupt while storing and retrieving, how to avoid this?
Also, please guide me know if anyone have a better plan.
When I hear someone uses curl_multi_exec it usually turns out they just load it with, say, 100 urls, then wait when all complete, and then process them all, and then start over with the next 100 urls... Blame me, I was doing so too, but then I found out that it is possible to remove/add handles to curl_multi while something is still in progress, And it really saves a lot of time, especially if you reuse already open connections. I wrote a small library to handle queue of requests with callbacks; I'm not posting full version here of course ("small" is still quite a bit of code), but here's a simplified version of the main thing to give you the general idea:
public function launch() {
$channels = $freeChannels = array_fill(0, $this->maxConnections, NULL);
$activeJobs = array();
$running = 0;
do {
// pick jobs for free channels:
while ( !(empty($freeChannels) || empty($this->jobQueue)) ) {
// take free channel, (re)init curl handle and let
// queued object set options
$chId = key($freeChannels);
if (empty($channels[$chId])) {
$channels[$chId] = curl_init();
}
$job = array_pop($this->jobQueue);
$job->init($channels[$chId]);
curl_multi_add_handle($this->master, $channels[$chId]);
$activeJobs[$chId] = $job;
unset($freeChannels[$chId]);
}
$pending = count($activeJobs);
// launch them:
if ($pending > 0) {
while(($mrc = curl_multi_exec($this->master, $running)) == CURLM_CALL_MULTI_PERFORM);
// poke it while it wants
curl_multi_select($this->master);
// wait for some activity, don't eat CPU
while ($running < $pending && ($info = curl_multi_info_read($this->master))) {
// some connection(s) finished, locate that job and run response handler:
$pending--;
$chId = array_search($info['handle'], $channels);
$content = curl_multi_getcontent($channels[$chId]);
curl_multi_remove_handle($this->master, $channels[$chId]);
$freeChannels[$chId] = NULL;
// free up this channel
if ( !array_key_exists($chId, $activeJobs) ) {
// impossible, but...
continue;
}
$activeJobs[$chId]->onComplete($content);
unset($activeJobs[$chId]);
}
}
} while ( ($running > 0 && $mrc == CURLM_OK) || !empty($this->jobQueue) );
}
In my version $jobs are actually of separate class, not instances of controllers or models. They just handle setting cURL options, parsing response and call a given callback onComplete.
With this structure new requests will start as soon as something out of the pool finishes.
Of course it doesn't really save you if not just retrieving takes time but processing as well... And it isn't a true parallel handling. But I still hope it helps. :)
P.S. did a trick for me. :) Once 8-hour job now completes in 3-4 mintues using a pool of 50 connections. Can't describe that feeling. :) I didn't really expect it to work as planned, because with PHP it rarely works exactly as supposed... That was like "ok, hope it finishes in at least an hour... Wha... Wait... Already?! 8-O"
You can use curl_multi: http://www.somacon.com/p537.php
You may also want to consider doing this client side and using Javascript.
Another solution is to write a hunter/gatherer that you submit an array of URLs to, then it does the parallel work and returns a JSON array after it's completed.
Put another way: if you had 100 URLs you could POST that array (probably as JSON as well) to mysite.tld/huntergatherer - it does whatever it wants in whatever language you want and just returns JSON.
Aside from the curl multi solution, another one is just having a batch of gearman workers. If you go this route, I've found supervisord a nice way to start a load of deamon workers.
Things you should look at in addition to CURL multi:
Non-blocking streams (example: PHP-MIO)
ZeroMQ for spawning off many workers that do requests asynchronously
While node.js, ruby EventMachine or similar tools are quite great for doing this stuff, the things I mentioned make it fairly easy in PHP too.
Try execute from PHP, python-pycurl scripts. Easier, faster than PHP curl.