PHP/MySQL: Best way to fetch new entries?

PHP/MySQL: Best way to fetch new entries? - php

We've got some kind of groups at our site where group members could put messages. I want to notice the members of each group when there's new entries.
What's the best way to do this?

If you want to notify them immediately, you could just send them an email/whatever immediately as you store the message.
Otherwise you could store the highest 'read' messageid against each user, then you can fairly easily fetch any subseqent messages.

If there possible will be a lot of messages per group than it worth to have group-level field like LastNotificationTime and then schedule task to look for messages with created time later that current LastNotificationTime value (then send report about them to users) and update LastNotificationTime to current time.
Even better is to have user-level LastActivityTimeInGroup field and compare message creation date with it (maybe adding some minutes to it for smoother experience) so only the inactive at that time users will receive any messages and if any not more than 1 per task interval time.

I currently use a key store for this purpose (memcachedb). what you could do is to add a key such as 'messageread_{message.id}_{user.id}' set to whatever value you choose (i just use int 1), then use a memcached client to do a getmulti for all the messages the user has (as an array of keys posted above). any key that is returned is read :)
I use this method with a 60 second cache of this query itself to give users an unread message count with great results.
http://memcachedb.org/
i originally used a highest id type check and soon after switched to a multi-level comment system, thus making highest id moot in that context (as you could end up seeing a comment and miss one that is a reply). so far this method has been very fast.
to do something like this you would need to install the memcache or memcached client from pecl here http://pecl.php.net/package/memcache and here http://pecl.php.net/package/memcached respectively. do an install of memcachedb (will need a box with shell access), and start up the service.
client pseudo code would run something like this:
// get messages a user is part of
code...
// build an array of memcachedb keys from that list, set key to message_id to reference later
$keys = array();
foreach($mresult as $current){
$keys[$current['message_id']] = 'messageread_'.$current['message_id'].'_'.$myid;
}
// using the "memcached" php client
$result = $memcacheobj->getMulti($keys);
getmulti returns an associative array in the struct of 'memcachekey' => 'memcachevalue'
// loop over result, removing found results from keys
foreach($keys as $key => $val){
if(isset($result[$val])) unset($keys[$key]);
}
// cleanup to have array of unread messages, flip array to set values to message_id, reset array keys with array_values (optional)
$unread = array_values(array_flip($keys));
now you have an array of unread messages in the structure of:
array(
0 => 12,
1 => 23,
etc...
)
this is just a rough mockup of the process, im sure some cleaning up could be done to optimize this :)

Related

Optimal way to detect fields to delete in database comparing to an array of IDs

I am trying to do the following.
I am consulting an external database using a web service. What the web service does is bring me all the products from an ERP system my client uses. As the server and the connection are not really fast, what I decided to do is basically synchronize the database on my web server and handle most operations there, so that the website can run smoothly.
Everything works fine I just need one last step to guarantee that the inventory on the website matches the one available on the ERP. The only issue comes when they (the client) deletes something on the ERP system.
At the moment I am thinking what would be the ideal strategy (least resource and time consuming) to remove products from my Products table if I don't receive them in the web service result.
So I basically have the following process:
I query the web service for all the products, give them a little format and store them in an array. The final size is about 600 indexes.
Then what I do is I do a foreach cycle and have the following subprocess.
I query my database to check if product_id is present.
If the product is present, I just update it with the latest info, stock data.
If the product is not present, I just insert it.
So, I was thinking of doing the following, but I do not think it's the ideal way:
Do a SELECT * FROM Products and generate an array that has all the products.
Do a foreach cycle in the resulting array and in each cycle scan the ERP array to check if the specific product exists. If not I delete it, if yes, I continue with the next product.
Now considering that after all the previous steps this would involve a couple of nested foreach I am a little worried that it might consume too much memory and also take longer to process.
I was thinking that maybe something like array_diff or array map could solve the issue, but I am not really experienced with these functions, and the structure of the two arrays differs a lot, so I am not sure if it would work that easily.
What would you guys recommend?

It's actually quite simple:
SELECT id FROM Products
Then you have an array of your product Ids, for example:
[123,5679,345]
Then as you go and do your updates or inserts, remove the id from the array.
[for updates]I query my database to check if product_id is present.
This is redundant now.
There are a few ways to remove the value from the array (when you do an update), this is the way I would probably do it.
if(false !== ($index = array_search($data['product_id'],$myids))){
//note the !== type comparison because array_search can return 0 for the first index, we must check for boolean false.
//find the index of the product id in our list of id's from local DB
unset($myids[$index]);
//If our incoming product_id is in the local list we Do Update
}else{
//Otherwise we Do Insert
}
As I mentioned above when doing your updates/inserts, You no longer have to check if the ID exists, because you already know this by having an array of IDs from the database. This alone saves you (n) queries (apx 600).
Then its very simple if you have ids left over.
//I wouldn't normally concatenate variables into SQL, in this case it's a list of int IDs from the database.
//you can of course come up with a loop to make it a prepared statement if you wish, but for the sake of simplistically, I'll leave that as an exercise for another day..
'DELETE FROM Products WHERE id IN('.implode(',', $myids).')'
And because you unset these when Updating, then the only thing left is Products that no longer exist.
Conclusion:
You have no choice (other then doing on duplicate key query, or ignoring exceptions) then to pull out the product Ids. You're already doing this on a row by row basis. So we can effectively kill 2 birds with one stone.
If you need more data then just the ID, for example you check that the product was changed before doing an update. Then pull that data out, but I would recommend using PDO and the FETCH_GROUP option. I wont go into the specifics of that but to say it lets you easily build your array this way:
[{product_id} => [ {product_name}, {product_price} etc..]];
Basically the product_id, is the key with a nested array of the row data, this will make lookup easier.
This way you can look it up like this.
//then instead of array_search
//if(false !== ($index = array_search($data['product_id'],$myids))){
if(isset($myids[$data['product_id']])){
unset($myids[$data['product_id']]);
//do your checks, then your update
}else{
//do inserts
}
References:
http://php.net/manual/en/function.array-search.php
array_search — Searches the array for a given value and returns the first corresponding key if successful
WARNING This function may return Boolean FALSE, but may also return a non-Boolean value which evaluates to FALSE. Please read the section on Booleans for more information. Use the === operator for testing the return value of this function.
UPDATE
There is one other really good way to do this, and that is to add a field called sync_date, now when you do your insert or update then set the sync_date to the current data.
This way when you are done, those products with an older sync date then today can be deleted. In this case it's best to cache the time when doing it so you know the exact time.
$time = data('Y-m-d H:i:s'); //or time() if you prefer timestamp
//use this same variable for the whole coarse of the script.
Then you can do
'DELETE from products WHERE sync_time != $time'
This may actually be a bit better because it has more utility. When was the last time it was ran, Now you know.

How to run mysql query backward?

The title of question may sound weird but with my english could not get better title.
A have created chat aplication in my website. Now I want to add notifications.
When submitting a new message I am checking if other user has read previous message in given conversation. If he has then I write a new notification. If he has not seen previous then I do not write a new notification.
I use mysql count() function to count fields and then do the php logic. In CI it looks like this:
public function ifUnreadMsgs($con_id, $sender_id)
{
$this->db->where('conversation_id', $con_id);
$this->db->where('sender_id', $sender_id);
$this->db->where('seen IS NULL', null, false);
$this->db->from('messages');
$count = $this->db->count_all_results();
if($count > 0){
return true;
}else{
return false;
}
}
My question is about optimization. I do know that with time I will have a lot of messages. Lets say I have 1 000 000 messages stored in database. I also know that the one with possible "NULL" in seen will be with msg_id of approx. 999 995. I have to use this query often and user waits for ajax response so I want to reduce the time for query as much as possible.
Is it possible to run query backward and stop as I hit the value I was looking for? I thought about using DISTICT or LIMIT keyword for stopping but how to run it backwards?
EDIT:
Actually I need to start looping through messages table starting from last row, stop at "conversation_id" and look if "seen" is NULL or not.

you can use ORDER BY for both ascending & descending order. by default ORDER BY starts from the first, but you can start it by adding DESC. Please check this & that

How combine the sorted sets Redis?

I use sorted set type in Redis store.
For each user I create a own KEY and put here data:
Example of KEY:
FEED:USER:**1**, FEED:USER:**2**, FEED:USER:**3**
I want to select data from Redis for user's keys: 1, 2, 3 and sorted each by score (timestamp).
If see at problem simply, I need select from any KEY a data across time and after combine all results sorted by score.

There are a couple of ways to do this but the right one depends on what you're trying to do. For example:
You can use ZRANGEBYSCORE (or ZREVRANGEBYSCORE) in your code for each FEED:USER:n key and "merge" the replies in the client
You can do a ZUNIONSTORE on the relevant keys and then do the ZRANGEBYSCORE on the result from the client.
However, if your "feeds" are large, #2's flow should be reversed - first range and then union.
You could also do similar types of processing entirely server-side with some Lua scripting.
EDIT: further clarifications
Re. 1 - Merging could be done client-side on the results that you get from ZRANGEBYSCORE or you could use server-side Lua scripts to do that. Use the WITHSCORES to get the timestamp and merge/sort on it. Regardless the your choice of location for running this code (I'd probably use Lua for data locality), the implementation is up to you - lmk if you need help with that :)

Use Redis for a timeout queue or leaderboard

I want to use Redis basically like this, if it (hypothetically) accepted SQL:
SELECT id, data, processing_due FROM qtable WHERE processing_due < NOW()
where processing_due is an integer timestamp of some sort.
The idea is then to also remove completed "jobs" with something like:
DELETE from qtable WHERE id = $someid
Which Redis commands would I use on the producing ("insert") and consuming ("select, delete from") end?
I find that Redis can be used as a queue, but I don't want the answers in strictly the order they were inserted, but rather based on if "now" is past processing_due.
I imagine this is almost the same problem as a leaderboard?
(I try to wrap my head around how Redis works and it looks simple enough from the documentation, but I just don't get it.)
Would a decent solution be to do ZADD qtable <timestamp> <UUID> and then use the UUID as a key to store the (json) value under it?

You can use a Sorted Set, in which the score is your time (an integer as you've suggested), and then you query using ZRANGEBYSCORE. Each member would be a Json representation of your "fields". For example: {id:"1",data:"bla",processing_due:"3198382"}
Regarding delete, just use ZREM when you find the relevant member to delete. pass your Json string as a parameter and you're OK.
A possibly better variant would be to just hold generated IDs as your member, and in a separate String-type key save pairs of your IDs along with the Json representation of your data. Just remember to maintain the two structs in sync.

MongoDB speed when returning last document

I have a web app with tons of documents. User can enter id (valid MongoId / ObjectId), but if user doesn't enter it I have to retrieve object with last id:
I'm concerned about speed for searching last object. I'm currently doing it like this:
db.docs.find({"status": 1}).sort({"_id": -1}).limit(1);
//Or in php:
$docs->find(array('status' => 1))->sort(array('_id' => -1))->limit(1)->getNext();
Isn't this a bit slow? First is looking for all docs with status 1 then sort them and limit then. Is there any better way for getting last document with status 1?

To make this performant you'd likely need to add an index on { status: 1, _id: -1 }.
You can also use findOne instead of find with a limit to simplify the syntax:
db.docs.findOne({"status": 1}).sort({"_id": -1});

Perhaps just store another value in another table, or in an in-memory cache perhaps, indicating the highest id value the system has as status=1. It would require a small bit of logic to be added when inserting/updating objects in the database, to compare the id value of objects with status=1 against the current cached id value, updating if the value is higher. You could then access the latest file directly using this cached value.
It is a little clunky, but would probably perform much better than the find.sort.limit operation you are currently doing as your number of objects grows.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.