Handling HTTP request

Handling HTTP request - php

If there is a HTTP request coming to a web server from many clients the requests will be handled in the order.
For all the http request i want to use a token bucket system.
So when there is a first Request i write a number to a file and increment the number for the next request and so on..
I dont want to do it in DB since the DB size increases..
Is this the right way to do this.Please suggest
Edit:So if a user posts a comment the comment should be stored in the a file instead of the DB.So to keep track of it there is a variable that is incremented for every request.this number will be used in writing the file name and refer it for future reference.so if there are many requests is this the right way to do it..
Thanks..

Why not lock ( http://php.net/manual/en/function.flock.php ) files in a folder ?
First call locks 01,
Second call locks 02,
3rd call locks 03,
01 gets unlocked,
4th call locks 01
Basically each php script tries to lock the first file it can and when it's done it unlocks/erases the file.
I use this in a system with 250+ child processes spawned by a "process manager". Tried to use a database but it slowed down everything.
If you want to keep incrementing the file number for some content i would suggest using mktime() or time() and using
$now=time();
$suffix=0;
while(is_file($dir.$now.'_'.$suffix)) {
$suffix++;
}
But again, depending on how you want to read the data or use it, there are many options. Could you provide more details?
-----EDIT 1-----
Each request has a "lock-file", and stores the lock id (number) is in $lock.
three visitors post at the same time with the lock-id 01, 02, 03 (the last step in the described situation)
$now=time();
$suffix=0;
$post_id=30;
$dir='posts/'.$post_id.'/';
if(!is_dir($dir)) { mkdir($dir,0777,true); }
while(is_file($dir.$mktime.'_'.$lock.'_'.$suffix.'.txt')) {
$suffix++;
}
The while should not be neede but i usually keep it anyway just in case :).
That should create a txt file 30/69848968695_01_0.txt and ..02_0.txt and ..03_0.txt.
When you want to show the comments you just sort them by filename....

The database size need not increase. All you need is a single row. In concept the logic goes:
Read row, taking lock, getting the current count
Write row with count incremented, releasing lock
Note that you're using the database locks to deal with the possibilities that multiple requests are being processed at the same time.
So I'm suggesting to use the database as the place to manage your count. You can still write your other data to files if you wish. However you'll still need housekeeping for the files. Is that much harder with a database?

I agree with some of the other commenters that, regardless of whichever problem you are trying to solve, you may be making it more difficult than it needs to be.
Your example is mentioning putting comments in a file and keeping them outside the database.
What is the purpose of your count, exactly? You want to count the number of comments a user has made? Or total number of comment requests exactly?
If you don't have to update the count anywhere in real time, you could write a simple script that reads your server access logs and adds up the total.
Also, Matthew points out above, if you want requests to be handled in a particular order you will be rapidly heading for strange concurrency bugs and performance issues.
If you update your post to include details more explicitly, we should be able to help you further.
Hope this helps.

Related

Downloading many web pages with PHP curl

I'm building a PHP application which has a database containing approximately 140 URL's.
The goal is to download a copy of the contents of these web pages.
I've already written code which reads the URL's from my database then uses curl to grab a copy of the page. It then gets everything between <body> </body>, and writes it to a file. It also takes into account redirects, e.g. if I go to a URL and the response code is 302, it will follow the appropriate link. So far so good.
This all works ok for a number of URL's (maybe 20 or so) but then my script times out due to the max_execution_time being set to 30 seconds. I don't want to override or increase this, as I feel that's a poor solution.
I've thought of 2 work arounds but would like to know if these are a good/bad approach, or if there are better ways.
The first approach is to use a LIMIT on the database query such that it splits the task up into 20 rows at a time (i.e. run the script 7 separate times, if there were 140 rows). I understand from this approach it still needs to call the script, download.php, 7 separate times so would need to pass in the LIMIT figures.
The second is to have a script where I pass in the ID of each individual database record I want the URL for (e.g. download.php?id=2) and then do multiple Ajax requests to them (download.php?id=2, download.php?id=3, download.php?id=4 etc). Based on $_GET['id'] it could do a query to find the URL in the database etc. In theory I'd be doing 140 separate requests as it's a 1 request per URL set up.
I've read some other posts which have pointed to queueing systems, but these are beyond my knowledge. If this is the best way then is there a particular system which is worth taking a look at?
Any help would be appreciated.
Edit: There are 140 URL's at the moment, and this is likely to increase over time. So I'm looking for a solution that will scale without hitting any timeout limits.

I dont agree with your logic , if the script is running OK and it needs more time to finish, just give it more time it is not a poor solution.What you are suggesting makes things more complicated and will not scale well if your urls increase.
I would suggest moving your script to the command line where there is no time limit and not using the browser to execute it.

When you have an unknown list wich will take an unknown amount of time asynchronous calls are the way to go.
Split your script into a single page download (like you proposed, download.php?id=X).
From the "main" script get the list from the database, iterate over it and send an ajax call to the script for each one. As all the calls will be fired all at once, check for your bandwidth and CPU time. You could break it into "X active task" using the success callback.
You can either set the download.php file to return success data or to save it to a database with the id of the website and the result of the call. I recommend the later because you can then just leave the main script and grab the results at a later time.
You can't increase the time limit indefinitively and can't wait indefinitively time to complete the request, so you need a "fire and forget" and that's what asynchronous call does best.
As #apokryfos pointed out, depending on the timing of this sort of "backups" you could fit this into a task scheduler (like chron). If you call it "on demand", put it in a gui, if you call it "every x time" put a chron task pointing the main script, it will do the same.

What you are describing sounds like a job for the console. The browser is for the users to see, but your task is something that the programmer will run, so use the console. Or schedule the file to run with a cron-job or anything similar that is handled by the developer.

Execute all the requests simultaneously using stream_socket_client(). Save all the socket IDs in an array
Then loop through the array of IDs with stream_select() to read the responses.
It's almost like multi-tasking within PHP.

Location based caching system

I like to have a location based data caching(on the server) system for supplying data for a mobile application. i.e., if some user requests data from a location (which is common to all the users from same area), i'll fetch the values from DB and show to them. But if a second user retrieves the same page within the next 5 mins from the same location, then i don't want to query the millions of records present in the DB and i can just take them if it is there in file cache. So any such things available now in PHP?

I am not aware of any such thing in PHP, but it's not too hard to make your own caching engine with PHP. You need to make a cache directory and based on the requests you get you have to check if a file corresponding to that request is there in your cache directory or not.
e.g your main parameters are lat and long.
Suppose you get the request with lat = 123 and long =234 (taking some random values), you will check your cache folder is a file named 123_234.data is present or not. If it is present, instead of querying the database you read the file and send the content as the output, else you read from the database and before sending the response write that response in a file cache/123_234.data. This way you can serve the files later too without querying the database again.
Challenges:
Time: The cache will expire at some point or the other. So while checking if the file exists, you also need to check the last modified timestamp to ensure the cache is not expired. It depends on you application requirements if the cache expires in a minute, 10 minutes, hours, days or months.
Making intelligent cache file names in this case is going to be challenging as even for a distance of 100m, the lat,long combination will be different. One option for you might be to choose the file names by setting the precision. e.g a real lat long combination is of the form 28.631541,76.945281. You may want to make a cache file named 28.63154_76.94528.data (reducing precision value to 5 places after decimal). It again depends if you want to cache just for a single point on a globe or for a geographical region, and if a geographical region, then the radius of it.
I don't know why someone down voted the question, I believe it is a very good and intelligent question. There goes my upvote :)

If all you are concerned about is the queries...one approach might be a db table that stores query results as json or serialized php objects along with whatever fields you need to match locations.
A cron job running on whatever interval best suits would clear out expired results

Run a SQL query (count) every 30 sec, and then save the output to some file

I am developing a website and got a database where people can insert data (votes). I want to keep a counter in the header like "x" votes have been cast. But it is possible that there will be a lot of traffic on the website soon. Now I can do it with the query
SELECT COUNT(*) FROM `tblvotes
and then display number in the header, but then every time the users changes page, it will redo the query, so I am thinking, maybe it is better to the query once every 30 sec (so much less load on the mysql server) but then I need to save the output of it to some place (this shouldn't be so hard; I can write it to a textfile?) But how can I let my website automatically every 30 sec run the query and put the number in the file. I got no SSH to the server so I can t crontab it?
If there is something you might not understand feel free to ask!

Simplest approach: Write the result into a local textfile, check the filetime of the textfile on every request to be less than now() + 30 seconds, and if so, update the file. To update, you should lock the file. While the file is being updated, other users for whom the condition now() + 30 is met should only read the currently existing file to avoid race conditions.
Hope that helps,
Stefan

Crontabs can only run every minute, at its fastest.
I think there is a better solution to this. You should make an aggregate table in which the statistical information is stored.
With a trigger on the votes_table, you can do 'something' every time the table receives a INSERT statement.
The aggregate table will then store the most accurate information, which you then can query to display the count.

Better solution will be using some cache mechanism (e.g. APC) instead of files if your server allows it.

If you can, you may want to look into using memcached. It allows you to set an expiry time for any data you add to it.
When you first do the query, you write the md5 of the query text associated with the result. On subsequent queries, look for the data in memcached. If it is expired, you can redo the sql query and then rewrite it to memcached.

Okay, so the first part of your question is basically about caching the result of the total votes to be included in the header of your page. Its a very good idea - here is an idea of how to implement it...
If you can not enable a crontab (even with out SSH access you might be able to set this up using your hostings control panel), you might be able to get away with using an external 3rd party cronjob service. (Google has many results for this)...
Everytime your cronjob runs, you can create/update a file that simply contains some PHP arrays -
$fileOutput = "<"."?php\n\n";
$fileOutput .= '$'.$arrayName.'=';
$fileOutput .= var_export($yourData,true);
$fileOutput .= ";\n\n?".">";
$handle = fopen(_path_to_cache_file,'w+');
fwrite($handle,$fileOutput);
fclose($handle);
That will give you a PHP file that you can simply include() into your header markup and then you'll have access to the $yourData variable under the name $arrayName.

Data updating conflict

I am currently working on a web application where I have encountered a little problem. In this system, multiple users can log onto the same page and update the data (a series of checkboxes, dropdowns, and text fields).
The issue is that data might get overwritten if one user was already on a page where old data was loaded, and has since been updated, and submits their changes, which update everything.
Any suggestions on how to solve this problem? I am currently just working with plain-text files.

I am currently just working with plain-text files.
Suggestion 1. Use a database.
Suggestion 2. Use a lock file. Use OS-level API calls to open a file with an exclusive
lock. The first user to acquire this file has exclusive access to the data. When that
user finishes their transaction, close the file, release the OS-level lock.
Suggestion 3. Don't "update" the file. Log the history of changes. You can then read usernames and timestamps from the log to find the latest version.
If you do this, you need to make each request do something like this.
When getting the current state, read the last line from the file. Also, get the file size and last modification time. Keep the size and last modified time in the session. Display the current state in the form.
When the user's change is being processed, check the file size and last modification time. If the file is different from what was recorded in the session, this user is attempting an update to data which was changed by someone else. Read the last line from the file. Also, get the file size and last modification time. Keep the size and last modified time in the session. Display the current state in the form.
In addition, you might want to have two files. One with "current" data, the other with the history of changes. This can make it faster to find the current data, since it's the only record in the current state file.
Another choice is to have a "header" in your file that is a fixed-size block of text. Each time you append, you also seek(0,0) and refresh the header with the offset to the last record as well as the timestamp of the last change.

When saving new data you could compare the date that data has been modified with the time the user started editing.
If there have been modifications while the user was making changes you could then show a message to the user and ask them which version to take or allow him to merge the two versions.
This problem has been addressed by revision systems, like svn, git, etc. in the very same fashion.

You can make an additional table, and store there all information as well as userID, so you will be able to access using joins all data that users inserted.

Generating scoreboards on large traffic sites

Bit of an odd question but I'm hoping someone can point me in the right direction. Basically I have two scenarios and I'd like to know which one is the best for my situation (a user checking a scoreboard on a high traffic site).
Top 10 is regenerated every time a user hits the page - increase in load on the server, especially in high traffic, user will see his/her correct standing asap.
Top 10 is regenerated at a set interval e.g. every 10 minutes. - only generates one set of results causing one spike every 10 minutes rather than potentially once every x seconds, if a user hits in between the refresh they won't see their updated score.
Each one has it's pros and cons, in your experience which one would be best to use or are there any magical alternatives?
EDIT - An update, after taking on board what everyone has said I've decided to rebuild this part of the application. Rather than dealing with the individual scores I'm dealing with the totals, this is then saved out to a separate table which sort of acts like a cached data source.
Thank you all for the great input.

Adding to Marcel's answer, I would suggest only updating the scoreboards upon write events (like new score or deleted score). This way you can keep static answers for popular queries like Top 10, etc. Use something like MemCache to keep data cached up for requests, or if you don't/can't install something like MemCache on your server serialize common requests and write them to flat files, and then delete/update them upon write events. Have your code look for the cached result (or file) first, and then iff it's missing, do the query and create the data

Nothing is never needed real time when it comes to the web. I would go with option 2 users will not notice that there score is not changing. You can use some JS to refresh the top 10 every time the cache has cleared

To add to Jordan's suggestion: I'd put the scorecards in a separate (HTML formatted) file, that is produced every time when new data arrives and only then. You can include this file in the PHP page containing the scorecard or even let a visitor's browser fetch it periodically using XMLHttpRequests (to save bandwidth). Users with JavaScript disabled or using a browser that doesn't support XMLHttpRequests (rare these days, but possible) will just see a static page.

The Drupal voting module will handle this for you, giving you an option of when to recalculate. If you're implementing it yourself, then caching the top 10 somewhere is a good idea - you can either regenerate it at regular intervals or you can invalidate the cache at certain points. You'd need to look at how often people are voting, how often that will cause the top 10 to change, how often the top 10 page is being viewed and the performance hit that regenerating it involves.
If you're not set on Drupal/MySQL then CouchDB would be useful here. You can create a view which calculates the top 10 data and it'll be cached until something happens which causes a recalculation to be necessary. You can also put in an http caching proxy inline to cache results for a set number of minutes.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.