Long polling with PHP and jQuery - issue with update and delete - php

I wrote a small script which uses the concept of long polling.
It works as follows:
jQuery sends the request with some parameters (say lastId) to php
PHP gets the latest id from database and compares with the lastId.
If the lastId is smaller than the newly fetched Id, then it kills the
script and echoes the new records.
From jQuery, i display this output.
I have taken care of all security checks. The problem is when a record is deleted or updated, there is no way to know this.
The nearest solution i can get is to count the number of rows and match it with some saved row count variable. But then, if i have 1000 records, i have to echo out all the 1000 records which can be a big performance issue.
The CRUD functionality of this application is completely separated and runs in a different server. So i dont get to know which record was deleted.
I don't need any help coding wise, but i am looking for some suggestion to make this work while updating and deleting.
Please note, websockets(my fav) and node.js is not an option for me.

Instead of using a certain ID from your table, you could also check when the table itself was modified the last time.
SQL:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
If successful, the statement should return something like
UPDATE_TIME
2014-04-02 11:12:15
Then use the resulting timestamp instead of the lastid. I am using a very similar technique to display and auto-refresh logs, works like a charm.
You have to adjust the statement to your needs, and replace yourdb and yourtable with the values needed for your application. It also requires you to have access to information_schema.tables, so check if this is available, too.
Two alternative solutions:
If the solution described above is too imprecise for your purpose (it might lead to issues when the table is changed multiple times per second), you might combine that timestamp with your current mechanism with lastid to cover new inserts.
Another way would be to implement a table, in which the current state is logged. This is where your ajax requests check the current state. Then generade triggers in your data tables, which update this table.

You can get the highest ID by
SELECT id FROM table ORDER BY id DESC LIMIT 1
but this is not reliable in my opinion, because you can have ID's of 1, 2, 3, 7 and you insert a new row having the ID 5.
Keep in mind: the highest ID, is not necessarily the most recent row.
The current auto increment value can be obtained by
SELECT AUTO_INCREMENT FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
Maybe a timestamp + microtime is an option for you?

Related

checking number of records in mySQL table

I am looking for a way to check if there are certain number of records within mysql table. For example: After POST request been before putting data to dabase, it checks first how many records there are. If lets say there are 24 records, then it will delete record with latest date based on timestamp and then inster new value from POST request. Has anyone got idea on how to do it? Looking forward fpr your answers. Below I attached simple code i wrote to insert data from post request into table.
<?php
include("connect.php");
$link=Connection();
$temp1=$_POST["temp1"];
$hum1=$_POST["hum1"];
$query = "INSERT INTO `tempLog` (`temperature`, `humidity`)
VALUES ('".$temp1."','".$hum1."')";
mysql_query($query,$link);
mysql_close($link);
header("Location: index.php");
?>
When you say delete with the latest date I have to assume you mean the oldest record? Your description doesnt tell me the name of you date field so Lets assume its onDate. You also didnt mention what your primary key is so lets assume that is just id. if you run the below query before inserting it will purge all the oldest records leaving only the newest 23 in the database.
delete from templog where id in (
select id from (
select #rownum:=#rownum+1 'rowid', t.id from templog t, (select #rownum:=0)r order by t.onDate
)v where v.rowid > 23
);
Of course you should test on data you don't mind losing.
It is best to do a cleanup purge each time instead of removing a single row before adding a new one because in the event of exceptions it will never clean itself down to the 24 rows you wish to truly have.
I also want to note that you may want to reconsider this method all together. Instead leave the data there, and only query the most recent 24 when displaying the log. Since you are going through the trouble of collecting the data you might as well keep it for future reporting. Then later down the road if your table gets to large run a simple daily purge query to delete anything older than a certain threshold.
Hope this helps.

Get next MySQL row ID when any number of previous rows have been deleted

I have the following call to my database to retrieve the last row ID from an AUTO_INCREMENT column, which I use to find the next row ID:
$result = $mysqli->query("SELECT articleid FROM article WHERE articleid=(SELECT MAX(articleid) FROM article)");
$row = $result->fetch_assoc();
$last_article_id = $row["articleid"];
$last_article_id = $last_article_id + 1;
$result->close();
I then use $last_article_id as part of a filename system.
This is working perfectly....until I delete a row meaning the call retrieves an ID further down the order than the one I want.
A example would be:
ID
0
1
2
3
4-(deleted row)
5-(deleted row)
6-(next ID to be used for INSERT call)
I'd like the filename to be something like 6-0.jpg, however the filename ends up being 4-0.jpg as it targets ID 3 + 1 etc...etc...
Any thoughts on how I get the next MySQL row ID when any number of previous rows have been deleted??
You are making a significant error by trying to predict the next auto-increment value. You do not have a choice, if you want your system to scale... you have to either insert the row first, or rename the file later.
This is a classic oversight I see developers make -- you are coding this as if there would only ever be a single user on your site. It is extremely likely that at some point two articles will be created at almost the same time. Both queries will "predict" the same id, both will use the same filename, and one of the files will disappear, one of the table entries may point to the wrong file, and the other entry will reference a file that does not exist. And you'll be scratching your head asking "how did this happen?!"
Predicting auto-increment values is bad practice. Don't do it. Plan for concurrency.
Also, the information_schema tables are not really tables... they are server internals exposed to the SQL interface. Calls to the "tables" table, and show table status are expensive calls that you do not want to make in production... so don't be tempted to use something you find there.
You can use mysql_insert_id() after you insert the new row to retrieve the new key:
$mysqli->query($yourQueryHere);
$newId = $mysqli->insert_id();
That requires the id field to be a primary key, though (I believe).
As for the filename, you could store it in a variable, then do the query, then change the name and then write the file.

Server-side Pagination: total row count for expensive query?

I have a simple query using server-side pagination. The issue is the WHERE Clause makes a call to an expensive function and the functions argument is the user input, eg. what the user is searching for.
SELECT
*
FROM
( SELECT /*+ FIRST_ROWS(numberOfRows) */
query.*,
ROWNUM rn FROM
(SELECT
myColumns
FROM
myTable
WHERE expensiveFunction(:userInput)=1
ORDER BY id ASC
) query
)
WHERE rn >= :startIndex
AND ROWNUM <= :numberOfRows
This works and is quick assuming numberOfRows is small. However I would also like to have the total row count of the query. Depending on the user input and database size the query can take up to minutes. My current approach is to cache this value but that still means the user needs to wait minutes to see first result.
The results should be displayed in the Jquery datatables plugin which greatly helps with things like serer-side paging. It however requires the server to return a value for the total records to correctly display paging controls.
What would be the best approach? (Note: PHP)
I thought if returning first page immediately with a fake (better would be estimated) row count. After the page is loaded do an ajax call to a method that determines total row count of the query (what happens if the user pages during that time?) and then update the faked/estimated total row count.
However I have no clue how to do an estimate. I tried count(*) * 1000 with SAMPLE (0.1) but for whatever reason that actually takes longer than the full count query. Also just returning a fake/random value seems a bit hacky too. It would need to be bigger than 1 page size so that the "Next" button is enabled.
Other ideas?
One way to do it is as I said in the comments, to use a 'countless' approach. Modify the client side script in such a way that the Next button is always enabled and fetch the rows until there are none, then disable the Next button. You can always add a notification message to say that there are no more rows so it will be more user friendly.
Considering that you are expecting a significant amount of records, I doubt that the user will paginate through all the results.
Another way is to schedule a cron job that will do the counting of the records in the background and store that result in a table called totals. The running intervals of the job should be set up based on the frequency of the inserts / deletetions.
Then in the frontend, just use the count previously stored in totals. It should make a decent aproximation of the amount.
Depends on your DB engine.
In mysql, solution looks like this :
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
Basically, you add another attribute on your select (SQL_CALC_FOUND_ROWS) which tells mysql to count the rows as if limit clause was not present, while executing the query, while FOUND_ROWS actually retrieves that number.
For oracle, see this article :
How can I perform this query in oracle
Other DBMS might have something similar, but I don't know.

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don't want to hit the DB for every page view. Currently there are ~45k rows returned by this query.
Here are some of the approaches I've considered:
Cache the row count and update it every X minutes
Limit (and offset) the rows counted to 41 (for example) and display the page picker as "1 2 3 4 ..."; then recompute if anyone actually goes to page 4 and display "... 3 4 5 6 7 ..."
Get the row count once and store it in the user's session
Get rid of the page picker and just have a "Next Page" link
I've had to engineer a few pagination strategies using PHP and MySQL for a site that does over a million page views a day. I persued the strategy in stages:
Multi-column indexes I should have done this first before attempting a materialized view.
Generating a materialized view. I created a cron job that did a common denormalization of the document tables I was using. I would SELECT ... INTO OUTFILE ... and then create the new table, and rotate it in:
SELECT ... INTO OUTFILE '/tmp/ondeck.txt' FROM mytable ...;
CREATE TABLE ondeck_mytable LIKE mytable;
LOAD DATA INFILE '/tmp/ondeck.txt' INTO TABLE ondeck_mytable...;
DROP TABLE IF EXISTS dugout_mytable;
RENAME TABLE atbat_mytable TO dugout_mytable, ondeck_mytable TO atbat_mytable;
This kept the lock time on the write contended mytable down to a minimum and the pagination queries could hammer away on the atbat materialized view. I've simplified the above, leaving out the actual manipulation, which are unimportant.
Memcache I then created a wrapper about my database connection to cache these paginated results into memcache. This was a huge performance win. However, it was still not good enough.
Batch generation I wrote a PHP daemon and extracted the pagination logic into it. It would detect changes mytable and periodically regenerate the from the oldest changed record to the most recent record all the pages to the webserver's filesystem. With a bit of mod_rewrite, I could check to see if the page existed on disk, and serve it up. This also allowed me to take effective advantage of reverse proxying by letting Apache detect If-Modified-Since headers, and respond with 304 response codes. (Obviously, I removed any option of allowing users to select the number of results per page, an unimportant feature.)
Updated:
RE count(*): When using MyISAM tables, COUNT didn't create a problem when I was able to reduce the amount of read-write contention on the table. If I were doing InnoDB, I would create a trigger that updated an adjacent table with the row count. That trigger would just +1 or -1 depending on INSERT or DELETE statements.
RE page-pickers (thumbwheels) When I moved to agressive query caching, thumb wheel queries were also cached, and when it came to batch generating the pages, I was using temporary tables--so computing the thumbwheel was no problem. A lot of thumbwheel calculation simplified because it became a predictable filesystem pattern that actually only needed the largest page numer. The smallest page number was always 1.
Windowed thumbweel The example you give above for a windowed thumbwheel (<< 4 [5] 6 >>) should be pretty easy to do without any queries at all so long as you know your maximum number of pages.
My suggestion is ask MySQL for 1 row more than you need in each query, and decide based on the number of rows in the result set whether or not to show the next page-link.
MySQL has a specific mechanism to compute an approximated count of a result set without the LIMIT clause: FOUND_ROWS().
MySQL is quite good in optimizing LIMIT queries.
That means it picks appropriate join buffer, filesort buffer etc just enough to satisfy LIMIT clause.
Also note that with 45k rows you probably don't need exact count. Approximate counts can be figured out using separate queries on the indexed fields. Say, this query:
SELECT COUNT(*)
FROM mytable
WHERE col1 = :myvalue
AND col2 = :othervalue
can be approximated by this one:
SELECT COUNT(*) *
(
SELECT COUNT(*)
FROM mytable
) / 1000
FROM (
SELECT 1
FROM mytable
WHERE col1 = :myvalue
AND col2 = :othervalue
LIMIT 1000
)
, which is much more efficient in MyISAM.
If you give an example of your complex query, probably I can say something more definite on how to improve its pagination.
I'm by no means a MySQL expert, but perhaps giving up the COUNT(*) and going ahead with COUNT(id)?

Best way to update user rankings without killing the server

I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).
This is what I currently do to update ranks:
$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
$i=0;
while ($a = mysql_fetch_array($get_users)) {
$i++;
mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");
}
UPDATE (solution):
Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).
mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");
Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.
TRUNCATE TABLE userRanks;
INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;
My first question would be: why are you doing this polling-type operation every five minutes?
Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.
I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?
That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.
Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.
A simple alternative for bulk update might be something like:
set #rnk = 0;
update users
set month_rank = (#rnk := #rnk + 1)
order by month_score DESC
This code uses a local variable (#rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.
Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.
I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000
DECLARE #tmp TABLE
(
[MonthRank] [INT] NOT NULL,
[UserId] [INT] NOT NULL,
)
INSERT INTO #tmp ([UserId])
SELECT [id]
FROM [users]
WHERE [status] = '1'
ORDER BY [month_score] DESC
UPDATE users
SET month_rank = [tmp].[MonthRank]
FROM #tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]
In MS SQL Server 2005/2008 you would probably use a CTE.
Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.
How much time does it spend calculating the scores, compared with assigning the rankings?
Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.
Consider
mysql_query("delete from month_rank_staging;");
while(bla){
mysql_query("insert into month_rank_staging values ('$id', '$i');");
}
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");
That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.
Possibly you could use shards by time or other category. But read this carefully before...
You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.

Categories