Apply SQL Query to in-memory PHP Object or Array

Apply SQL Query to in-memory PHP Object or Array - php

Use Case:
I'm building a site where users can search records - with SQL. BUT - they should also be able to save their search and be notified when a new submitted record meets the criteria.
It's not a car buying site, but for example: The user searches for a 1967 Ford Mustang with a 289 V8 engine, within the 90291 ZIP code. Can't find the right one, but they want to be notified if a matching car is submitted 2 weeks later.
So of course, every time a new car is added to the DB, I can retrieve all the user search queries, and run all of them over all the cars in the DB. But that is not scalable.
Rather than search the entire "car" table with every "search" query every time a new car is submitted, I would like to just check that single "car" object/array in memory, with the existing user queries.
I'm doing this in PHP with Laravel and Eloquent, but I am implementation agnostic and welcome any theoretical approaches.
Thanks,
Chris

I would rather run the saved searches in batches at scheduled intervals and not run them avery time a record is appended to the tables.

It comes down to how you structure your in memory cache.
Whatever cache it is it usually relies on key, value pairs. It will be the same for the cache you are using:
http://laravel.com/docs/4.2/cache
So in the end it is all about using the right key. If you want to update the cached objects based on a car, then you would need to make the key in a way so that you can retrieve all objects from the cache using the car as (part of) the key. Usually you would concat multiple things for key like userId+carId+xyz and then make a MD5 checksum of that.
So that would be the answer to your question. However generally I would not recommend this approach. It sounds like your search results are more like persisted long term available results. So you would probably want to store them somewhere more permanent like a simple table. Then you can use standard SQL tools to join the table and find out what is needed.

My approach would be to use a MySQL stored procedure and use https://dev.mysql.com/doc/refman/5.1/en/event-scheduler.html to review the configs for possible changes and then flag them storing some kind of dirty indicator which is then checked by a php script which would be executed on demand or from cron etc periodically.
You could use the trigger to simply flag that the event scheduler has work to do. However you approach there are a number of state variables which starts to get ugly however this use case doesn't seem to map neatly into a queuing architecture as far as I can see.

A possible approach would be to use a trigger in SQL to send a notification. Here is something related with it: 1s link or 2nd link.

Related

How to build datastore indexes (PHP GAE)

I am using Tom Walder's Google Datastore Library for PHP to insert data into my Google App Engine Datastore.
$obj_schema = (new GDS\Schema('Add Log'))
->addString('name', TRUE)
->addDatetime('time', TRUE);
$obj_store = new GDS\Store($obj_gateway, $obj_schema);
$obj_store->upsert($obj_store->createEntity(['name' => "test",'time' => date('Y-m-d H:i:s', time())]));
When I insert data like the above code, everything seems to be importing properly (each property say they are indexed).
But when I go to do a query with multiple selectors it says "You need an index to execute this query".
My query
The error message
Does anyone know what I need to do to make sure my queries are being indexed? This is what my dashboard hows with plenty of data using the code I showed.

As Alex Martelli mentioned in a comment, most of the time, your indexes are built when you run your app on the devserver and have your datastore get queried there (this adds the required indexes for any question into your index.yaml file.
So you have two ways you can go at it.
1- Run your app on local devserver, go to your dev "Developer Console" to add one or two entities to your datastore. Run your queries, that'll populate your index.yaml with all required indexes. You can then run appcfg.py update_indexes to just deploy your index.yaml (bottom of this page)
2- Your other solution would be to read this, a page on how datastore indexes work. Then read this advanced article on indexes. You should also watch the following presentation that will give you a better insight into indexes and the datastore. Once that's all done, figure out which queries you want, and flesh out the required indexes in your index.yaml, then deploy with the same method as in 1.
Quick summary of how indexes work
So you can think of the datastore as a pure READER. It doesn't, like normal relational databases, do any kind of computation as it reads your data and returns it. Therefore, to be able to run a given query (say "all client orders passed before christmas 2013"), then you need a table where all your client orders are ordered by date (so the system doesn't have to check every row's date to see if it matches. It just takes the first "chunk" of your data, up to the date you're looking for, and returns it).
Therefore, you need to have those indexes built, and they will influence the queries you can run. By default, every attribute is indexed by itself, in descending order. For any queries on more than one attribute (or with a different sort order), you need to have the index (in that case they are called composite indexes) built by the datastore, so you need to declare it in your index.yaml.
In the last years, Google added the zigzag merge join algorithm, which is basically a way to take 2 composite indexes (that start with the same attributes, so there is common ground between the 2 sub-queries) and run 2 sub-queries on them, then have the algorithm join the responses of both sub-queries.

Whichever make the script more slowly (I have two choices)

I have table in database named ads, this table contains data about each ad.
I want to get that data from table to display ad.
Now, I have two choices:
Either get all data from table and store it in array, and then , I will treat with this array to display each ad in its position by using loops.
Or access to table directly and get each ad data to display it, note this way will consume more queries to database.
Which one is the best way, and not make the script more slow ?

In most Cases #1 is better.
Because, if you can select the data (smallest, needed set) in one query,
then you have less roundtrips to the database server.
Accessing Array or Objectproperties (from Memory) are usually faster than DB Queries.
You could also consider to prepare your Data and don't mix fetching with view output.
The second Option "select on demand" could make sense if you need to "lazy load",
maybe because you can or want to recognize client properties, like viewport.

I'd like to highlight the following part:
get all data from table and store it in array
You do not need to store all rows into an array. You could also take an iterator that represents the resultset and then use that one.
Depending on the database object you use this is often the less memory-intensive variant. Also you would run only one query here which is preferable.
The iterator is actually common with modern database result objects.
Additionally this is helpful to decouple the view code from the actual database interaction and you can also defer to do the SQL query.

You should minimize the amount of queries but you should also try to minimize the amount of data you actually get from the database.
So: Get only those ads that you are actually displaying. You could for example use columnPK IN (1, 2, 3, 4) to get those ads.
A notable exception: If your application is centered around "ads" and you need them pretty much everywhere, and/or they don't consume much memory, and/or there aren't too many adds, it might be better performance-wise to store all (or a subset) of your ads in an array.
Above all: Measure, measure, measure!
It is very, very hard to predict which algorithm will be most efficient. Often you implement something "because it will be more efficient" only to find out later that your optimization is actually slowing down your application.

You should always try to run a PHP script with the least amount of database queries possible. Whenever you query the database, a request must be sent to the database (usually) over the network, and your script will idle until the request came back.
You should, however, make sure not to request any more data from the database than necessary. So try to filter as much in the WHERE clause as possible instead of requesting the whole table and then picking individual rows on the PHP layer.
We could help with writing that SQL query when you tell us how your table looks and how you want to select which ads to display.

AJAX-like Interaction With Stored Procedure?

I think I'm probably looking at this the complete wrong way. I have a stored procedure that returns a (potentially large, but usually not) result set. That set gets put into a table on the web via PHP. I'm going to implement some AJAX for stuff like dynamic reordering and things. The stored procedure takes one to two seconds to run, so it would be nice if I could store that final table somewhere that I can access it faster once it's been run. More specifically, the SP is a search function; so I want the user to be able to do the search, but then run an ORDER BY on the returned data without having to redo the whole search to get that data again.
What comes to mind is if there is a way to get results from the stored procedure without it terminating, so I can use a temp table. I know I could use a permanent table, but then I'd run into trouble if two people were trying to use it at the same time.

A short and simple answer to the question: 'is a way to get results from the stored procedure without it terminating?': No, there isn't. How else would the SP return the resultset?
2 seconds does sound like an awfully long time, perhaps you could post the SP code, so we can look at ways to speed up the query's you use. It might also prove useful to give some more info on your tables (indeces, primary keys... ).
If all else fails, you might consider looking into JavaScript table sorters... but again: some code might help here

Where to store search matches in cakephp?

I am writing an app in cakephp that will perform scheduled searches for users and store the search results in a matches table. My question is do I really need this matches model in cakephp to store the results? If the answer is no, how should I store the results?

Happy new year.
There are many ways to store data and the one you choose will depend on the data itself and the use to which it will be put (and when it will be used). Because you are doing scheduled searches, I assume that the user may not be around when the search is done, in which case the result needs to be stored.
In this case, I'd use the database. If you need to keep historical results this is definitely the way to go. If the results can be overwritten, you could use a text file per user, but that might get messy.
You don't need to use the main database - you could have another MySql, for example or even a totally different one such as a flat file db.
What would I do? I'd use a table in the main database and get on with something else.

Tracking the views of a given row

I have a site where the users can view quite a large number of posts. Every time this is done I run a query similar to UPDATE table SET views=views+1 WHERE id = ?. However, there are a number of disadvantages to this approach:
There is no way of tracking when the pageviews occur - they are simply incremented.
Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.
Therefore I consider employing an approach where I create a table, say:
object_views { object_id, year, month, day, views }, so that each object has one row pr. day in this table. I would then periodically update the views column in the objects table so that I wouldn't have to do expensive joins all the time.
This is the simplest solution I can think of, and it seems that it is also the one with the least performance impact. Do you agree?
(The site is build on PHP 5.2, Symfony 1.4 and Doctrine 1.2 in case you wonder)
Edit:
The purpose is not web analytics - I know how to do that, and that is already in place. There are two purposes:
Allow the user to see how many times a given object has been shown, for example today or yesterday.
Allow the moderators of the site to see simple view statistics without going into Google Analytics, Omniture or whatever solution. Furthermore, the results in the backend must be realtime, a feature which GA cannot offer at this time. I do not wish to use the Analytics API to retrieve the usage data (not realtime, GA requires JavaScript).

Quote : Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.
There is much more than this. This is database killer.
I suggest u make table like this :
object_views { object_id, timestamp}
This way you can aggregate on object_id (count() function).
So every time someone view the page you will INSERT record in the table.
Once in a while you must clean the old records in the table. UPDATE statement is EVIL :)
On most platforms it will basically mark the row as deleted and insert a new one thus making the table fragmented. Not to mention locking issues .
Hope that helps

Along the same lines as Rage, you simply are not going to get the same results doing it yourself when there are a million third party log tools out there. If you are tracking on a daily basis, then a basic program such as webtrends is perfectly capable of tracking the hits especially if your URL contains the ID's of the items you want to track... I can't stress this enough, it's all about the URL when it comes to these tools (Wordpress for example allows lots of different URL constructs)
Now, if you are looking into "impression" tracking then it's another ball game because you are probably tracking each object, the page, the user, and possibly a weighted value based upon location on the page. If this is the case you can keep your performance up by hosting the tracking on another server where you can fire and forget. In the past I worked this using SQL updating against the ID and a string version of the date... that way when the date changes from 20091125 to 20091126 it's a simple query without the overhead of let's say a datediff function.

First just a quick remark why not aggregate the year,month,day in DATETIME, it would make more sense in my mind.
Also I am not really sure what is the exact reason you are doing that, if it's for a marketing/web stats purpose you have better to use tool made for that purpose.
Now there is two big family of tool capable to give you an idea of your website access statistics, log based one (awstats is probably the most popular), ajax/1pixel image based one (google analytics would be the most popular).
If you prefer to build your own stats database you can probably manage to build a log parser easily using PHP. If you find parsing apache logs (or IIS logs) too much a burden, you would probably make your application ouput some custom logs formated in a simpler way.
Also one other possible solution is to use memcached, the daemon provide some kind of counter that you can increment. You can log view there and have a script collecting the result everyday.

If you're going to do that, why not just log each access? MySQL can cache inserts in continuous tables quite well, so there shouldn't be a notable slowdown due to the insert. You can always run Show Profiles to see what the performance penalty actually is.
On the datetime issue, you can always use GROUP BY MONTH( accessed_at ) , YEAR( accessed_at) or WHERE MONTH(accessed_at) = 11 AND YEAR(accessed_at) = 2009.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.