I have the need to render a threaded view of a 2-levels hierarchical commenting system. The comments are stored in a database. The information about the hierarchy is given by field 'parent_id' (which is 0 for the top-level comments). I cannot change the structure of the database.
The present solution is by means of multiple SQL queries:
an SQL query is performed to fetch all top-level comments
the code loops through the top-level comments and for each of them performs an SQL query to fetch its children
Now I wonder if a solution with only one SQL query to fetch all the comments as they are followed by the code suggested here to sort them by threads could be more efficient.
Any reccomendation?
Thanks,
Luciano
I've done similar scripts and from my point of view it's better to do a first query to fetch all the 'parents' (parent_id==0) and then for each one of them do another query to get all its 'sons' information.
If you have to retrieve a HUGE ammount of threats using a single query you have to wait for the query to complete to work with the data. If you divide the search in different and smaller queries you can start formating and printing them before looking for the 'sons'. Also doing it in a single query could make the query slower since using more tables in the same query could make it halt due to a lock_table when someone is creating a new threat.
Another solution, which I would only recommend if the query is very slow due to being forced to use multiple JOINs or using WHERE with non-indexed fields (you should never do that, but if you can't change the database...), is to retrieve ALL the threats in a single query (both parents and sons, without any of those WHERE or JOIN that makes the query slower) and then organize them using PHP. This is by no means practical, and you should never use this method unless the time to complete the query is very long.
Related
Lets say a have Web page with some classes. One is loaded Mysqli connect it to DB at the beginning and keep connected. Now question is:
Is good solution make in (example setting class) prepared statement for calling value from DB table 'settings' and keep it open (statement) until finish (at footer close statement and connection) or just load all data from 'settings' DB table to array() in php and just call it from array not binding it from DB.
Second question is if I have statement open may I open another statement for another class (example class for calling text from DB) and do it same like in previous example? And than, of course close it at finish page.
Is there any performance or security problem, you can see there...
As far as I know, nobody is doing it this way. Mostly because the real benefit from the multiple execution is not that grand as some imagine, and just doesn't worth the trouble. For the short primary key lookups run in small numbers (several dozens at max) you'll hardly be able to tell the difference.
(However, there are no arguments against such practice either - you can make it this way, with single statement prepared/multiple executions, if you wish).
Yet single query that is fetching no more than couple hundreds of records still would be faster than separate queries (even prepared) to get the same amount. So, as long as your settings keep at moderate amount, it's better to get them all at once.
Yes, of course you can have as many statements prepared as you need.
(The only problem could be with fetching results. You have to always get_result/store_result, to make sure there are no results left pending and preventing other queries to run, either regular or prepared).
The statement executes as one SQL statement over your DB connection. It's not going to keep going back to the database and grabbing a single row one at a time, so don't worry about that.
In general, you should be loading everything into some data structure. If your query is returning more data than you need, then that's something you need to fix in your query. Don't run SQL that returns a huge set of data, then rely on PHP to go through it row by row and perform some hefty operations on it. Just write SQL that gets what you need in the first place. I realize this isn't always possible, but when people talk about optimizing their website, query optimization is usually at/near the top of that list, so it's pretty important.
You're definitely supposed to execute multiple statements. It's silly to keep opening and closing entire db connections before getting any data.
I'm using PHP with the PDO library to work with a MySQL database, and I am fetching several thousand rows from a database based on two of the fields. Specifically, I need rows corresponding to certain latitude-longitude pairs, if such points exist. Currently, I am making a prepared statement and executing it once per point/row. What is the fastest way to fetch many rows like this? Just use a bunch of OR's?
It's impossible to answer to such a broad question.
The most general answer would be like this:
If you have to do this regularly - say, on a live site on each user's click - then, you have to reduce the number of parameters queried.
If it's just occasional operation - just leave it as is.
In other words, you don't need "fastest". You need "fast". Or even "reliable".
So, the context is: I have a site in which many pages may need the information about one table, say for instance, 'films'. This table has many fields, like title, language, year, description, director... And perhaps in one page I need only the title and the id of some rows and in another I also need the description.
So the question is: should I code a database manager (I am using MySQL) that retrieves all the fields of the rows that satisfy a condition (I guess the WHERE clause should be passed as a parameter)? Or should I be able to specify which fields are needed? I thinks this cannot be done easily with mysqli (because prepared statements require to specify beforehand the number of fetched fields), so for this to work I would need to use PDO instead, which I haven't used yet. Is it worth it this last approach? Or there is not really a big difference in performance if I retrieve the whole information about those rows?
Thank you in advance.
Based upon the comments above, My answer to your question(s) is
Retrieving some fields vs all fields isn't a real performance consideration until you are dealing with one or more CLOB/TEXT columns which have a lot of text in them. Good database practice indicates you should always specify which fields are returned from a query.
Any query against any table should have a where clause to restrict the number of rows returned. Especially if you are looking to query exactly one row.
Your question implies you are writing a wrapper layer around the queries to hide this complexity. Don't do this. Get an existing PHP library that does this work for you. See for example: Good PHP ORM Library? . There are a number of subtle issues, like security, which you will overlook.
I have table in database named ads, this table contains data about each ad.
I want to get that data from table to display ad.
Now, I have two choices:
Either get all data from table and store it in array, and then , I will treat with this array to display each ad in its position by using loops.
Or access to table directly and get each ad data to display it, note this way will consume more queries to database.
Which one is the best way, and not make the script more slow ?
In most Cases #1 is better.
Because, if you can select the data (smallest, needed set) in one query,
then you have less roundtrips to the database server.
Accessing Array or Objectproperties (from Memory) are usually faster than DB Queries.
You could also consider to prepare your Data and don't mix fetching with view output.
The second Option "select on demand" could make sense if you need to "lazy load",
maybe because you can or want to recognize client properties, like viewport.
I'd like to highlight the following part:
get all data from table and store it in array
You do not need to store all rows into an array. You could also take an iterator that represents the resultset and then use that one.
Depending on the database object you use this is often the less memory-intensive variant. Also you would run only one query here which is preferable.
The iterator is actually common with modern database result objects.
Additionally this is helpful to decouple the view code from the actual database interaction and you can also defer to do the SQL query.
You should minimize the amount of queries but you should also try to minimize the amount of data you actually get from the database.
So: Get only those ads that you are actually displaying. You could for example use columnPK IN (1, 2, 3, 4) to get those ads.
A notable exception: If your application is centered around "ads" and you need them pretty much everywhere, and/or they don't consume much memory, and/or there aren't too many adds, it might be better performance-wise to store all (or a subset) of your ads in an array.
Above all: Measure, measure, measure!
It is very, very hard to predict which algorithm will be most efficient. Often you implement something "because it will be more efficient" only to find out later that your optimization is actually slowing down your application.
You should always try to run a PHP script with the least amount of database queries possible. Whenever you query the database, a request must be sent to the database (usually) over the network, and your script will idle until the request came back.
You should, however, make sure not to request any more data from the database than necessary. So try to filter as much in the WHERE clause as possible instead of requesting the whole table and then picking individual rows on the PHP layer.
We could help with writing that SQL query when you tell us how your table looks and how you want to select which ads to display.
In PHP I'm using mysqli_fetch_assoc() in a while-loop to get every record in a certain query.
I'm wondering what happens if the data is changed while running the loop (by another process or server), so that the record doesn't match the query any more. Will it still be fetched?
In other words, is the array of records that are fetched fixed, when you do query()? Or is it not?
Update:
I understand that it's a feature that the resultset is not changed when the data is changed, but what if you actually WANT that? In my loop I'm not interested in records that are already updated by another server. How do I check for that, without doing a new query for each record that I fetch??
UPDATE:
Detailed explanation:
I'm working on some kind of searchengine-scraper that searches for values in a database. This is done by a few servers at the same time. Items that have been scraped shouldn't be searched anymore. I can't really control which server searches which item, I was hoping I could check the status of an item, while fetching the recordset. Since it's a big dataset, I don't transfer the entire resultset before searching, I fetch each record when I need it...
Introduction
I'm wondering what happens if the data is changed while running the loop (by another process or server), so that the record doesn't match the query any more. Will it still be fetched?
Yes.
In other words, is the array of records that are fetched fixed, when you do query()? Or is it not?
Yes.
A DBMS would not be worth its salt were it vulnerable to race conditions between table updates and query resultset iteration.
Certainly, as far as the database itself is concerned, your SELECT query has completed before any data can be changed; the resultset is cached somewhere in the layers between your database and your PHP script.
In-depth
With respect to the ACID principle *:
In the context of databases, a single logical operation on the data is called a transaction.
User-instigated TRANSACTIONs can encompass several consecutive queries, but 4.33.4 and 4.33.5 in ISO/IEC 9075-2 describe how this takes place implicitly on the per-query level:
The following SQL-statements are transaction-initiating
SQL-statements, i.e., if there is no current SQLtransaction, and an
SQL-statement of this class is executed, then an SQL-transaction is
initiated, usually before execution of that SQL-statement proceeds:
All SQL-schema statements
The following SQL-transaction statements:
<start transaction statement>.
<savepoint statement>.
<commit statement>.
<rollback statement>.
The following SQL-data statements:
[..]
<select statement: single row>.
<direct select statement: multiple rows>.
<dynamic single row select statement>.
[..]
[..]
In addition, 4.35.6:
Effects of SQL-statements in an SQL-transaction
The execution of an SQL-statement within an SQL-transaction has no
effect on SQL-data or schemas [..]. Together with serializable
execution, this implies that all read operations are repeatable
within an SQL-transaction at isolation level SERIALIZABLE, except
for:
1) The effects of changes to SQL-data or schemas and its contents
made explicitly by the SQL-transaction itself.
2) The effects of differences in SQL parameter values supplied to externally-invoked
procedures.
3) The effects of references to time-varying system
variables such as CURRENT_DATE and CURRENT_USER.
Your wider requirement
I understand that it's a feature that the resultset is not changed when the data is changed, but what if you actually WANT that? In my loop I'm not interested in records that are already updated by another server. How do I check for that, without doing a new query for each record that I fetch??
You may not.
Although you can control the type of buffering performed by your connector (in this case, MySQLi), you cannot override the above-explained low-level fact of SQL: no INSERT or UPDATE or DELETE will have an effect on a SELECT in progress.
Once the SELECT has completed, the results are independent; it is the buffering of transport of this independent data that you can control, but that doesn't really help you to do what it sounds like you want to do.
This is rather fortunate, frankly, because what you want to do sounds rather bizarre!
* Strictly speaking, MySQL has only partial ACID-compliance for tables other than those with the non-default storage engines InnoDB, BDB and Cluster, and MyISAM does not support [user-instigated] transactions. Still, it seems like the "I" should remain applicable here; MyISAM would be essentially useless otherwise.