I am working on a project that is being built with a standard LAMP stack. Currently, I merely output the results of the query onto the page - the results are not being stored in objects at all.
In the future, I would like to edit the results of the query. I imagine that this would much easier if the results were stored in PHP objects.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
You'd be better off storing a copy of the results directly in your object, rather than a serialized result handle. Serializing the result handle will NOT preserve locks, server-side variables, table state, transactions, or the data in the result set. MySQL has no provision for storing a connection handle in this fashion, so it'd be seen as a regular disconnect and resulting in outstanding queries being cleaned up, variables destroyed, transactions rolled back (or committed), etc...
As well, the data retrieved by the query is not actually fetched across the connection until you do a fetch_row()-type call, so you'd not even have that in your serialized handle.
Always create the objects in php, and destroy them later. In order to serialize you will need to use longtext or like field, which are known to be slow and you cannot index on them. If you are always doing a Select All, then go ahead, but if you ever use conditions or advanced queries, you should have all data separated.
It depends on many factors. If you are running the exact same queries again and again, then yes, store the results in your database. But why serialise them? If you tried Object-relational mapping, you could have a much easier to maintain query object, that you could store in a well organised relational database.
If you are not running the same queries very often, I would recommend caching the output in another way.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
No. Somebody somewhere has done this for you. What would be beneficial is for you to use an existing ORM. It doesn't matter which one, just pick one and use it. You'll be lightyears ahead and get your project out the in a fraction of the time.
You should use a PHP framework while you're at it, many of which come coupled to an ORM.
Related
For large arrays, is it better to save the data to global variables or query the database each time I need them? In my situation keeping them local scope and passing them to functions isn't an option.
I'm using wordpress and in most pages I get every user and all metadata attached to them. Often times I use these variables in multiple places on the same page. Unfortunately wordpress won't let me pass variables between templates so I'm stuck either using global variables or calling the database each time. Eventually, this will be hundreds of users with a lot of metadata attached to each. Should I call the database each time to keep the variables local, or should save them to global variables to save on database queries? What are the considerations? Should I worry about performance, overhead, and/or other issues?
Thanks so much!
The only real solution to your problem is using some kind of cache system (Memcache and Redis are your best options). Fortunately, there are plenty of Wordpress plugins that make the integration an easy thing. For instance:
Redis: https://wordpress.org/plugins/redis-object-cache/
Memcache: https://wordpress.org/plugins/memcached/
EDIT
If you only want to cache a few databases calls, you can forget about Wordpress plugins and start coding a bit. Let's say you only want to cache the call for retrieving the list of users from database, and let's assume you are using Memcache to accomplish this task (Memcache stores key-value pairs and allows super fast access to a value given a key).
Query Memcache asking for the key "users".
Memcache still doesn't have such key, so you'll have a cache fail and after it, you'll query your database to retrieve the user list. Now serialize the database response (serialize and json_encode are two different ways to do this) and store the key "users" along this serialized value in your memcache.
Next time you query your memcache asking for "users", you'll get a hit. In this moment you just have to unserialize the value and work with your user list.
And that's all. Now you just have to decide what you want to cache and apply this procedure to those elements.
You shouldn't have to perform the calls but once per page, you might have to execute the call once for every page. So I would suggest you creating some sort of class to interact with your database that you can call on to get the data that you need. I would also recommend using stored procedures and functions on your database instead of straight queries since this will help both with security and separation of application logic and data functionality.
DATABASE
I have a normalized Postgres 9.1 database and in it I have written some functions. One function in particular "fn_SuperQuery"(param,param, ...)" returns SET OF RECORD and should be thought of as a view (that accepts parameters). This function has lots of overhead because it actually creates several temporary tables while calculating its own results in order to gain performance with large data sets.
On a side note, I used to use WITH (cte's) exclusively for this query, but I needed the ability to add indexes on some columns for more efficient joins.
PHP
I use PHP strictly to connect to the database, run a query, and return the results as JSON. Each query starts with a connection string and then finishes with a call to pg_close.
FRONTEND
I am using jQuery's .ajax function to call the PHP file and accept the results.
My problem is this:
"fn_SuperQuery"(param,param, ...)" is actually the foundation for several other queries. There are some parts of this application that need to run several queries at once to generate all the necessary information for the end user. Many of these queries rely on the output of "fn_SuperQuery"(param,param, ...)" The overhead in running this query is pretty steep, and the fact that it would return the same data if given the same parameters makes me think that it's dumb to make the user wait for it to run twice.
What I want to do is return the results of "fn_SuperQuery"(param,param, ...)" into a temporary table, then run the other queries that require its data, then discard the temporary table.
I understand that PostgreSQL ... requires each session to issue its own CREATE TEMPORARY TABLE command for each temporary table to be used. If I could get two PHP files to connect to the same database session then they should both be able to see the temporary table.
Any idea on how to do this? ... or maybe a different approach I have yet to consider?
May be better will be using normal tables? It will be no much difference. You can speed it up by using unlogged tables.
In 9.3 probably better choice would be using materialized views.
Temporary tables are session-private. If you want to share across different sessions, use normal tables (probably unlogged).
If you are worried about denormalization, the first thing I would look at doing is just storing these temporary normal tables ;-) in a separate schema. This allows you to keep a the denormalized (and working set data) separate for analysis and such and avoids polluting the rest of you dataset with the denormalized tables.
Alternatively you could look at other means short of denormalization. For example if data isn't going to change after a while you could put summary entries periodically for unchangeable data. This is not a denormalization since it allows you to purge old detail records if you need to down the line while continuing to have certain forms of reporting open.
I have table in database named ads, this table contains data about each ad.
I want to get that data from table to display ad.
Now, I have two choices:
Either get all data from table and store it in array, and then , I will treat with this array to display each ad in its position by using loops.
Or access to table directly and get each ad data to display it, note this way will consume more queries to database.
Which one is the best way, and not make the script more slow ?
In most Cases #1 is better.
Because, if you can select the data (smallest, needed set) in one query,
then you have less roundtrips to the database server.
Accessing Array or Objectproperties (from Memory) are usually faster than DB Queries.
You could also consider to prepare your Data and don't mix fetching with view output.
The second Option "select on demand" could make sense if you need to "lazy load",
maybe because you can or want to recognize client properties, like viewport.
I'd like to highlight the following part:
get all data from table and store it in array
You do not need to store all rows into an array. You could also take an iterator that represents the resultset and then use that one.
Depending on the database object you use this is often the less memory-intensive variant. Also you would run only one query here which is preferable.
The iterator is actually common with modern database result objects.
Additionally this is helpful to decouple the view code from the actual database interaction and you can also defer to do the SQL query.
You should minimize the amount of queries but you should also try to minimize the amount of data you actually get from the database.
So: Get only those ads that you are actually displaying. You could for example use columnPK IN (1, 2, 3, 4) to get those ads.
A notable exception: If your application is centered around "ads" and you need them pretty much everywhere, and/or they don't consume much memory, and/or there aren't too many adds, it might be better performance-wise to store all (or a subset) of your ads in an array.
Above all: Measure, measure, measure!
It is very, very hard to predict which algorithm will be most efficient. Often you implement something "because it will be more efficient" only to find out later that your optimization is actually slowing down your application.
You should always try to run a PHP script with the least amount of database queries possible. Whenever you query the database, a request must be sent to the database (usually) over the network, and your script will idle until the request came back.
You should, however, make sure not to request any more data from the database than necessary. So try to filter as much in the WHERE clause as possible instead of requesting the whole table and then picking individual rows on the PHP layer.
We could help with writing that SQL query when you tell us how your table looks and how you want to select which ads to display.
I've implemented an Access Control List using 2 static arrays (for the roles and the resources), but I added a new table in my database for the permissions.
The idea of using a static array for the roles is that we won't create new roles all the time, so the data won't change all the time. I thought the same for the resources, also because I think the resources are something that only the developers should treat, because they're more related to the code than to a data. Do you have any knowledge of why to use a static array instead of a database table? When/why?
The problem with hardcoding values into your code is that compared with a database change, code changes are much more expensive:
Usually need to create a new package to deploy. That package would need to be regression tested, to verify that no bugs have been introduced. Hint: even if you only change one line of code, regression tests are necessary to verify that nothing went wrong in the build process (e.g. a library isn't correctly packaged causing a module to fail).
Updating code can mean downtime, which also increases risk because what if the update fails, there always is a risk of this
In an enterprise environment it is usually a lot quicker to get DB updates approved than code change.
All that costs time/effort/money. Note, in my opinion holding reference data or static data in a database does not mean a hit on performance, because the data can always be cached.
Your static array is an example of 'hard-coding' your data into your program, which is fine if you never ever want to change it.
In my experience, for your use case, this is not ever going to be true, and hard-coding your data into your source will result in you being constantly asked to update those things you assume will never change.
Protip: to a project manager and/or client, nothing is immutable.
I think this just boils down to how you think the database will be used in the future. If you leave the data in arrays, and then later want to create another application that interacts with this database, you will start to have to maintain the roles/resources data in both code bases. But, if you put the roles/resources into the database, the database will be the one authority on them.
I would recommend putting them in the database. You could read the tables into arrays at startup, and you'll have the same performance benefits and the flexibility to have other applications able to get this information.
Also, when/if you get to writing a user management system, it is easier to display the roles/resources of a user by joining the tables than it is to get back the roles/resources IDs and have to look up the pretty names in your arrays.
Using static arrays you get performance, considering that you do not need to access the database all the time, but safety is more important than performance, so I suggest you do the control of permissions in the database.
Study on RBAC.
Things considered static should be coded static. That is if you really consider them static.
But I suggest using class constants instead of static array values.
On the Facebook FQL pages it shows the FQL table structure, here is a screenshot below to show some of it (screenshot gone).
You will notice that some items are an array, such as meeting_sex, meeting_for current_location. I am just curious, do you think they are storing this as an array in mysql or just returning it as one, from this data it really makes me think it is stored as an array. IF you think it is, or if you have done similar, what is a good way to store these items as an array into 1 table field and then retrieve it as an array on a PHP page?
alt text http://img2.pict.com/3a/70/2a/2439254/0/screenshot2b187.png
The correct way to store an array in a database is by storing it as a table, where each element of the array is a row in the table.
Everything else is a hack, and will eventually make you regret your decision to try to avoid an extra table.
There are two options for storing as an array:
The first, which you mentioned, is to make one, or several, tables, and enumerate each possible key you intend to store. This is the best for searching and having data that makes sense.
However, for what you want to do, use serialize(). Note: DO NOT EVER EVER EVER try to search against this data in its native string form. It is much faster (and saner) to just reload it, call unserialize(), and then search for your criteria than to develop some crazy search pattern to do your bidding.
EDIT: If it were me, and this were something I was seriously developing for others to use (or even for myself to use, to be completely honest), I would probably create a second lookup table to store all the keys as columns; Heck, if you did that, mysql_fetch_assoc() could give you the array you wanted just by running a quick second query (or you could extract them out via a JOINed query). However, if this is just quick-and-dirty to get whatever job done, then a serialized array may be for you. Unless you really, really don't care about ever searching that data, the proper column-to-key relationship is, I think most would agree, superior.
I guarantee you that Facebook is not storing that data in arrays inside their database.
The thing you have to realize about FQL is that you are not querying Facebook's main data servers directly. FQL is a shell, designed to provide you access to basic social data without letting you run crazy queries on real servers that have performance requirements. Arbitrary user-created queries on the main database would be functional suicide.
FQL provides a well-designed data return structure that is convenient for the type of data that you are querying, so as such, any piece of data that can have multiple associations (such as "meeting_for") gets packaged into an array before it gets returned as an API result.
As other posters have mentioned, the only way to store a programming language structure (such as an array or an object) inside a database (which has no concept of these things), is to serialize it. Serializing is expensive, and as soon as you serialize something, you effectively make it unusable for indexing and searching. Being a social network, Facebook needs to index and search almost everything, so this data would never exist in array form inside their main schemas.
Usually the only time you ever want to store serialized data inside a database is if it's temporary, such as session data, or where you have a valid performance requirement to do so. Otherwise, your data quickly becomes useless.
Split it out into other tables. You can serialize it but that will guarantee that you will want to query against that data later. Save yourself the frustration later and just split it out now.
you can serialize the array, insert it, and then unserialize it when you retrieve it.
They might be using multiple tables with many-to-many relationships, but use joins and MySql's GROUP_CONCAT function to return the values as an array for those columns in one query.