Objects versus Arrays - php

I am working on a site at the moment, and there is a concentrated focus on efficiency and speed in loading, processing and such like.
I'm using the mysqli extension to get my database bits and bobs, but I'm wondering what's the best / most efficient way of outputting my dataset?
At the moment I'm using $mysqli->fetch_assoc() and a foreach(). Having read http://www.phpbench.com I know that counting my data first makes a difference. (I'm going to optimise after build)
My question is, which is quicker for getting a resultset into a php data thing. Creating an object? A numerical array? An associative array? My thoughts are an object, but I'm unsure.
Just curious, as I'm not familiar with the PHP internals :)

There is actually a small benchmark in PHP documentation under mysql_fetch_object's comments:
SELECT * FROM bench... (mysql_fetch_object)
Query time: 5.40725040436
Fetching time: 16.2730708122 (avg: 1.32130565643E-5)
Total time: 21.6803212166
SELECT * FROM bench... (mysql_fetch_array)
Query time: 5.37693023682
Fetching time: 10.3851644993 (avg: 7.48886537552E-6)
Total time: 15.7620947361
SELECT * FROM bench... (mysql_fetch_assoc)
Query time: 5.345921278
Fetching time: 10.6170959473 (avg: 7.64049530029E-6)
Total time: 15.9630172253
Fetching an object is slowest, fetching a numeric array is probably a bit faster than using mysql_fetch_array or mysql_fetch_assoc, but the difference is negligible. In fact, mysql_fetch_array fetches both assoc and numeric, and it's faster than mysql_fetch_assoc, go figure.. But if you're after performance, just don't use mysql_fetch_object.

From the manual page of mysql_fetch_object()
Note: Performance
Speed-wise, the function is identical to mysql_fetch_array(), and almost as quick as
mysql_fetch_row() (the difference is insignificant)
As the benchmark given by Tatu suggests, there is a slight difference, but keep in mind the numbers have been cumulated from 100 consecutive queries. I'd say your strategy of not bothering now and optimize later is a good choice.

I believe a numerical array is likely to be the most lightweight, followed by associative array, and then an object. I don't think the differences will add up to much, so whichever syntax you're most comfortable with is best.

Related

MySQLi query vs PHP Array, which is faster?

I'm developing an algorithm for intense calculations on multiple huge arrays. Right now I have used PHP arrays to do the job but, it seems slower than what I needed it to be. I was thinking on using MySQLi tables and convert the php arrays into database rows and then start the calculations to solve the speed issue.
At the very first step, when I was converting a 20*10 PHP array into 200 rows of database containing zeros, it took a long time. Here is the code: (Basically the following code is generating a zero matrix, if you're interested to know)
$stmt = $mysqli->prepare("INSERT INTO `table` (`Row`, `Col`, `Value`) VALUES (?, ?, '0')");
for($i=0;$i<$rowsNo;$i++){
for($j=0;$j<$colsNo;$j++){
//$myArray[$j]=array_fill(0,$colsNo,0);
$stmt->bind_param("ii", $i, $j);
$stmt->execute();
}
}
$stmt->close();
The commented-out line "$myArray[$j]=array_fill(0,$colsNo,0);" would generate the array very fast while filling out the table in next two lines, took a very longer time.
Array time: 0.00068 seconds
MySQLi time: 25.76 seconds
There is a lot more calculating remaining and I got worried even after modifying numerous parts it may get worse. I searched a lot but I couldn't find any answer on whether the array is a better choice or mysql tables? Has anybody done or know about any benchmarking test on this?
I really appreciate any help.
Thanks in advance
UPDATE:
I did the following test for a 273*273 matrix. I created two versions for the same data. First one, a two-dimension PHP array and the second one, a table with 273*273=74529 rows, both containing the same data. The followings are the speed test results for retrieving similar data from both [in here, finding out which column(s) of a certain row has a value equal to 1 - the other columns are zero]:
It took 0.00021 seconds for the array.
It took 0.0026 seconds for mysqli table. (more than 10 times slower)
My conclusion is sticking to the arrays instead of converting them into database tables.
Last thing to say, in case the mentioned data is stored in the database table in the first place, generating an array and then using it would be much much slower as shown below (slower due to data retrieval from database):
It took 0.9 seconds for the array. (more than 400 times slower)
It took 0.0021 seconds for mysqli table.
The main reason is not that the database itself is slower. The main reason is that the database access the hard-drive to store data and PHP functions use only the RAM memory to execute this procedure, wich is faster than the Hard-Drive.
Although there is a way to speed up your insert queries (most likely you are using innodb table without transaction), the very statement of question is wrong.
A database intended - in the first place - to store data. To store it permanently. It does it well. It can do calculations too, but again - before doing any calculations there is one necessary step - to store data.
If you want to do your calculations on a stored data - it's ok to use a database.
If you want to push your data in database only to calculate it - it makes not too much sense.
In my case, as shown on the update part of the question, I think arrays have better performance than mysql databases.
Array usage showed 10 times faster response even when I search through the cells to find desired values in a row. Even good indexing of the table couldn't beat the array functionality and speed.

mysql_num_rows() php - is it efficient?

I have several SELECT statements on a PHP page, and I used Dreamweaver to generate those.
After going through the code it generated, there seemed to be alot of fluff which I could cut out under most circumstances, a mysql_num_rows() line for each statement being an example.
So I'm wondering if anyone can tell me whether or not this actually saves resources - considering the query is being run regardless, is there any actual overhead for this?
UPDATE:
After following Chriszuma's suggestion about microtime, here are my results:
//time before running the query
1: 0.46837500 1316102620
//time after the query ran
2: 0.53913800 1316102620
//time before calling mysql_num_rows()
3: 0.53914200 1316102620
//time after mysql_num_rows()
4: 0.53914500 1316102620
So not much overhead at all, it seems
mysql_num_rows() counts rows after they have been fetched. It's like you fetched all rows and stored them in a PHP array, and then ran count($array). But mysql_num_rows() is implemented in C within the MySQL client library, so it should be a bit more efficient than the equivalent PHP code.
Note that in order for mysql_num_rows() to work, you do have to have the complete result of your query in PHP's memory space. So there is overhead in the sense that a query result set could be large, and take up a lot of memory.
I would expect that such a call would have an extremely minimal impact on performance. It is just counting the rows of its internally-stored query result. The SQL query itself is going to take the vast majority of processing time.
If you want to know for sure, you can execute microtime() before and after the call to see exactly how long it is taking.
$startTime = microtime(true);
mysql_num_rows();
$time = microtime(true) - $startTime;
echo("mysql_num_rows() execution: $time seconds\n");
My suspicion is that you will see something in the microseconds range.

Multiple Queries to a Large MySQL Table

I have a table with columns ID(int), Number(decimal), and Date(int only timestamp). There are millions of rows. There are indexes on ID and Date.
On many of my pages I am querying this four or five times for a list of Numbers in a specified date range (the range being different each query).
Like:
select number,date where date < 111111111 and date >111111100000
I'm querying these sets of data to be placed on several different charts. "Today vs Yesterday", "This Month vs Last Month", "This Year vs Last Year".
Would querying the largest possible result set with the sql statement and then using my programming language to filter down the query via a sorted and spliced array be better than waiting for each of these 0.3 second queries to finish?
Is there something else that can be done to speed this up?
It depends on the result set and the executing speed of your queries. There is no ultimate answer to this question.
You should benchmark and calculate the results if you really need to speed up things.
But keep in mind that premature optimization should be avoided besides that you'll implement an already implemented logic in your code which can contain bugs, etc. etc.
While it may cause the query to perform quicker you have to ask yourself about the potential impacts to memory if you were to attempt to load in the entire range of records and then aggregating it programatically.
Chances are that the MySQL optimatizations based on index will perform better than anything you could come up with anyway so it sounds like a bad idea.

MySql speed of executing max(), min(), sum() on relatively large database

I have a relatively large database (130.000+ rows) of weather data, which is accumulating very fast (every 5minutes a new row is added). Now on my website I publish min/max data for day, and for the entire existence of my weatherstation (which is around 1 year).
Now I would like to know, if I would benefit from creating additional tables, where these min/max data would be stored, rather than let the php do a mysql query searching for day min/max data and min/max data for the entire existence of my weather station. Would a query for max(), min() or sum() (need sum() to sum rain accumulation for months) take that much longer time then a simple query to a table, that already holds those min, max and sum values?
That depends on weather your columns are indexed or not. In case of MIN() and MAX() you can read in the MySQL manual the following:
MySQL uses indexes for these
operations:
To find the MIN() or MAX() value for a
specific indexed column key_col. This
is optimized by a preprocessor that
checks whether you are using WHERE
key_part_N = constant on all key parts
that occur before key_col in the
index. In this case, MySQL does a
single key lookup for each MIN() or
MAX() expression and replaces it with
a constant.
In other words in case that your columns are indexed you are unlikely to gain much performance benefits by denormalization. In case they are NOT you will definitely gain performance.
As for SUM() it is likely to be faster on an indexed column but I'm not really confident about the performance gains here.
Please note that you should not be tempted to index your columns after reading this post. If you put indices your update queries will slow down!
Yes, denormalization should help performance a lot in this case.
There is nothing wrong with storing calculations for historical data that will not change in order to gain performance benefits.
While I agree with RedFilter that there is nothing wrong with storing historical data, I don't agree with the performance boost you will get. Your database is not what I would consider a heavy use database.
One of the major advantages of databases is indexes. They used advanced data structures to make data access lightening fast. Just think, every primary key you have is an index. You shouldn't be afraid of them. Of course, it would probably be counter productive to make all your fields indexes, but that should never really be necessary. I would suggest researching indexes more to find the right balance.
As for the work done when a change happens, it is not that bad. An index is a tree like representation of your field data. This is done to reduce a search down to a small number of near binary decisions.
For example, think of finding a number between 1 and 100. Normally you would randomly stab at numbers, or you would just start at 1 and count up. This is slow. Instead, it would be much faster if you set it up so that you could ask if you were over or under when you choose a number. Then you would start at 50 and ask if you are over or under. Under, then choose 75, and so on till you found the number. Instead of possibly going through 100 numbers, you would only have to go through around 6 numbers to find the correct one.
The problem here is when you add 50 numbers and make it out of 1 to 150. If you start at 50 again, your search is less optimized as there are 100 numbers above you. Your binary search is out of balance. So, what you do is rebalance your search by starting at the mid-point again, namely 75.
So the work a database is just an adjustment to rebalance the mid-point of its index. It isn't actually a lot of work. If you are working on a database that is large and requires many changes a second, you would definitely need to have a strong strategy for your indexes. In a small database that gets very few changes like yours, its not a problem.

Difference in efficiency of retrieving all rows in one query, or each row individually?

I have a table in my database that has about 200 rows of data that I need to retrieve. How significant, if at all, is the difference in efficiency when retrieving all of them at once in one query, versus each row individually in separate queries?
The queries are usually made via a socket, so executing 200 queries instead of 1 represents a lot of overhead, plus the RDBMS is optimized to fetch a lot of rows for one query.
200 queries instead of 1 will make the RDBMS initialize datasets, parse the query, fetch one row, populate the datasets, and send the results 200 times instead of 1 time.
It's a lot better to execute only one query.
I think the difference will be significant, because there will (I guess) be a lot of overhead in parsing and executing the query, packaging the data up to send back etc., which you are then doing for every row rather than once.
It is often useful to write a quick test which times various approaches, then you have meaningful statistics you can compare.
If you were talking about some constant number of queries k versus a greater number of constant queries k+k1 you may find that more queries is better. I don't know for sure but SQL has all sorts of unusual quirks so it wouldn't surprise me if someone could come up with a scenario like this.
However if you're talking about some constant number of queries k versus some non-constant number of queries n you should always pick the constant number of queries option.
In general, you want to minimize the number of calls to the database. You can already assume that MySQL is optimized to retrieve rows, however you cannot be certain that your calls are optimized, if at all.
Extremely significant, Usually getting all the rows at once will take as much time as getting one row. So let's say that time is 1 second (very high but good for illustration) then getting all the rows will take 1 second, getting each row individually will take 200 seconds (1 second for each row) A very dramatic difference. And this isn't counting where are you getting the list of 200 to begin with.
All that said, you've only got 200 rows, so in practice it won't matter much.
But still, get them all at once.
Exactly as the others have said. Your RDBMS will not break a sweat throwing 200+++++ rows at you all at once. Getting all the rows in one associative array will also not make much difference to your script, since you no doubt already have a loop for grabbing each individual row.
All you need do is modify this loop to iterate through the array you are given [very minor tweak!]
The only time I have found it better to get fewer results from multiple queries instead of one big set is if there is lots of processing to be done on the results. I was able to cut out about 40,000 records from the result set (plus associated processing) by breaking the result set up. Anything you can build into the query that will allow the DB to do the processing and reduce result set size is a benefit, but if you truly need all the rows, just go get them.

Categories