I made a database which collects information on a daily basis. as like calender the database stores members' daily amount. If month doesn't match any existing month(M-Y) the database will create a new html month table. I have solved this problem, as follows:
mysql_query("something goes here");
while(condition)
{
mysql_query("something goes here")
while(condition)
{
mysql_query("something goes here");
while()
{
....................
}
........................
}
}
This algorithm worked well when I discovered it. However, after a few days, it was placing a heavy load on my server. I then tried the same algorithm in PHP (but I can't this). How can I make this run faster?
The code is as follows:
$q2=mysql_query("SELECT a.member_id,a.dates,MONTH(dates) AS months,
YEAR(dates)AS years,sum(amount) as sums
FROM account AS a
left join member as m
on(a.member_id=m.member_id)
GROUP BY (SELECT EXTRACT(YEAR_MONTH FROM dates))
ORDER by dates DESC
");
$k=0;
while($data2=mysql_fetch_assoc($q2))
{
$months=$data2['months'];
$years=$data2['years'];
$daten = new DateTime($data2['dates']);
print "<tr><th align='left'><b>".$daten->format('F-Y')."</b></th>";
$q3=mysql_query("select * from member");
while($data3=mysql_fetch_assoc($q3))
{
$id3=$data3['member_id'];
$q4=mysql_query("
SELECT SUM(amount) AS total FROM account
WHERE member_id=$id3
AND month(dates)=$months
AND year(dates)=$years
");
while($data4=mysql_fetch_assoc($q4))
{
$total=$data4['total'];
print "<td class='total'>".number_format($total)."</td>";
}
}
print "<td class='total'><b>".$data2['sums']."</b></td></tr>";
$k=$k+$data2['sums'];
}
Among other things:
You're running the query SELECT * FROM member for every row in the first query. This query is independent of the loop, so running it again every time is wasteful.
For each result from the SELECT * FROM member query, you're running another query (SELECT SUM(amount) AS total FROM account ...). There are several issues with this query:
First of all, this query could be combined into the previous query using a GROUP BY, to avoid having to run one query for every member. Something like:
SELECT member_id, SUM(amount) AS total FROM account WHERE ... GROUP BY member_id
Second of all, you're using MONTH(dates) = $months AND YEAR(dates) = $years. This is inefficient, as it forces the server to examine every row; converting it to a range on dates (e.g, dates BETWEEN '$year-$months-01' AND '$year-$months-31') would speed things up if there were an appropriate index on dates.
In general: Avoid queries in loops. The number of queries involved in generating a page should, to the degree possible, always be a small, nearly constant number. It should not grow with your data.
Have you setup the appropriate indexes in your MySQL database? This can cause a huge performance difference. http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
Related
i have a table "table1" which has almost 400,000 records. There is another table "table2" which has around 450,000 records.
I need to delete all the rows in table1 which are duplicate in table2. I been trying to do it with php and the script was running for hours and not completed yet. Does it really takes that much time?
field asin is varchar(20) in table1
field ASIN is Index and char(10) in table2
$duplicat = 0;
$sql="SELECT asin from asins";
$result = $conn->query($sql);
if ($result->num_rows > 0) {
while($row = $result->fetch_assoc()) {
$ASIN = $row['asin'];
$sql2 = "select id from asins_chukh where ASIN='$ASIN' limit 1";
$result2 = $conn->query($sql2);
if ($result2->num_rows > 0) {
$duplicat++;
$sql3 = "UPDATE `asins` SET `duplicate` = '1' WHERE `asins`.`asin` = '$ASIN';";
$result3 = $conn->query($sql3);
if($result3) {
echo "duplicate = $ASIN <br/>";
}
}
}
}
echo "totaal :$duplicat";
u can run one single sql command, instead of a loop, something like:
update table_2 t2
set t2.duplicate = 1
where exists (
select id
from table_1 t1
where t1.id = t2.id);
Warning! i didn't test the sql above, so you may need to verify the syntax.
For such kind of database operation, using php to loop and join is never a good idea. Most of the time will be wasted on network data transfer between your php server and mysql server.
If even the the above sql takes too long, you can consider limiting the query set with some range. Something like:
update table_2 t2
set t2.duplicate = 1
where exists (
select id
from table_1 t1
where t1.id = t2.id
and t2.id > [range_start] and t2.id < [range_end] );
This way, you can kick of several updates running in parallel
Yes, processing RBAR (Row By Agonizing Row) is going to be slow. There is overhead associated with each of those individual SELECT and UPDATE statements that get executed... sending the SQL text to the database, parsing the tokens for valid syntax (keywords, commas, expressions), validating the semantics (table references and column references valid, user has required privileges, etc.), evaluating possible execution plans (index range scan, full index scan, full table scan), converting the selected execution plan into executable code, executing the query plan (obtaining locks, accessing rows, generating rollback, writing to the innodb and mysql binary logs, etc.), and returning the results.
All of that takes time. For a statement or two, the time isn't that noticeable, but put thousands of executions into a tight loop, and it's like watching individual grains of sand falling in an hour glass.
MySQL, like most relational databases, is designed to efficiently operate on sets of data. Give the database work to do, and let the database crank, rather than spend time round tripping back and forth to the database.
It's like you've got a thousand tiny items to deliver, all to the same address. You can individually handle each item. Get a box, put the item into the box with a packing slip, seal the package, address the package, weigh the package and determine postage, affix postage, and then put it into the car, drive to the post office, drop the package off. Then drive back, and handle the next item, put it into a box, ... over and over and over.
Or, we could handle a lot of tiny items together, as a larger package, and reduce the amount of overhead work (time) packaging and round trips to and from the post office.
For one thing, there's really no need to run a separate SELECT statement, to find out if we need to do an UPDATE. We could just run the UPDATE. If there are no rows to be updated, the query will return an "affected rows" count of 0.
(Running the separate SELECT is like another round trip in the car to the post office, to check the list of packages that need to be delivered, before each round trip to the post office to drop off a package. Instead of two round trips, we can take the package with us one the first trip.)
So, that could improve things a bit. But it doesn't really get to the root of the performance problem.
The real performance boost comes from getting more work done in fewer SQL statements.
How would we identify ALL of the rows that need to be updated?
SELECT t.asins
FROM asins t
JOIN asins_chukh s
ON s.asin = t.asin
WHERE NOT ( t.duplicate <=> '1' )
(If asin isn't unique, we need to tweak the query a bit, to avoid returning "duplicate" rows. The point is, we can write a single SELECT statement that identifies all of the rows that need to be updated.)
For non-trivial tables, for performance, we need to have suitable indexes available. In this case, we'd want indexes with a leading column of asin. If such an index doesn't exist, for example...
... ON asins_chukh (asin)
If that query doesn't return a huge number of rows, we can handle the UPDATE in one fell swoop:
UPDATE asins t
JOIN asins_chukh s
ON s.asin = t.asin
SET t.duplicate = '1'
WHERE NOT ( t.duplicate <=> '1' )
We need to be careful about the number of rows. We want to avoid holding blocking locks for a long time (impacting concurrent processes that may be accessing the asins table), and we want to avoid generating a huge amount of rollback.
We can break the work up into more manageable chunks.
(Referring back to the shipping tiny items analogy... if we have millions of tiny items, and putting all of those into a single shipment would create a package larger and heaver than a container ship container... we can break the shipment into manageably sized boxes.)
For example, we could handle the UPDATE in "batches" of 10,000 id values (assuming id is a unique (or nearly unique), is the leading column in the cluster key, and the id values are grouped fairly well into mostly contiguous ranges, we can get the update activity localized into one section of blocks, and not have to revist most of those same blocks again...
The WHERE clause could be something like this:
WHERE NOT ( t.duplicate <=> 1 )
AND t.id >= 0
AND t.id < 0 + 10000
For the next next batch...
WHERE NOT ( t.duplicate <=> 1 )
AND t.id >= 10000
AND t.id < 10000 + 10000
Then
WHERE NOT ( t.duplicate <=> 1 )
AND t.id >= 20000
AND t.id < 20000 + 10000
And so on, repeating that until we're past the maximum id value. (We could run a SELECT MAX(id) FROM asins as the first step, before the loop.)
(We want to test these statements as SELECT statements first, before we convert to an UPDATE.)
Using the id column might not be the most appropriate way to create our batches.
Our objective is to create manageable "chunks" we can put into a loop, where the chunks don't overlap the same database blocks... we won't need to revisit the same block over and over, with multiple statements, to make changes to rows within the same block multiple times.
I'm building a small internal phone number extension list.
In the database I have two tables, a people table, and a numbers table. The relationship is a one-to-many respectively.
I want to display the results in a single HTML table with one person per row but if they have multiple numbers it shows those in a single column with a rowspan on the person row to compensate.
Now, to get the results from the database to work with, I can either do:
(pseudocode)
SELECT id, name
FROM people
foreach result as row {
SELECT number
FROM numbers
WHERE numbers.person_id = row['id']
}
This would mean that I'm doing one query to get all users, but then if I have 100 users, I'm performing 100 additional queries to get the numbers for each user.
Instead I could do it like this:
(pseudocode)
SELECT number, person_id
FROM numbers
SELECT id, name
FROM people
foreach people as person {
echo name
foreach numbers as number {
if (number.id = person.id) {
echo number
}
}
}
So, essentially it is doing the exact same thing except instead I do two queries to get all the results into arrays and then loop through the arrays to format my tables.
Which method should I be using or is there a better way to do this?
The common way is to do a regular JOIN:
SELECT id, name, number
FROM people, numbers
WHERE people.id = numbers.person_id;
You can either add an ORDER BY to get the numbers in order, or you could create an array with a single pass over the resultset, and then loop through that array.
You can also consider a GROUP_CONCAT to concatenate all the numbers for the same person:
SELECT id, name, GROUP_CONCAT(number SEPARATOR ', ')
FROM people, numbers
WHERE people.id = numbers.person_id
GROUP BY people.id;
Since you are even asking this question: I cannot stress the fact that you should pick up an introductory book on database design. It helped me wonders to learn the theories behind relational databases.
You probably want to execute just one query. Something like
select people.id, people.name, group_concat(numbers.number)
from people
inner join numbers on numbers.id = people.id
group by people.id, people.name
order by people.name
Then you can loop over the result set with simple php code.
It depends, and you may have to time it to find out. Doing multiple queries is a lot of network turns if your database is on a different computer than your web server, so often this takes longer. However, if your database server is on the same computer as your web server, this might not be an issue.
Also consider the time it will take to look up the number in the array. As an array you are doing a linear order O(N) search. If you can put it in a hash, the lookup will be much faster, and the two query approach may be faster, but not if you spend a lot of time looking up the answer in your array.
Using a join to get it into one query, may be fastest, as the numbers will be associated with the people, depending on your container structure you are storing the data into to be accessed in your foreach loop.
Use a stored procedure or function to retrive the data, don't wite the sql in your programm
You should do neither. You can do one query (join) over the tables:
select name, number
from people p, numbers n
where p.id = n.person_id
order by name, number;
and then just one loop:
$link = mysqli_connect(...);
$query = 'select name, number from people p, numbers n where p.id = n.person_id order by name, number';
if ($result = mysqli_query($link, $query)) {
$person = '';
while ($row = mysqli_fetch_array($result, MYSQLI_ASSOC)) {
if ($row['name'] !== $person) {
$person = $row['name'];
echo $person;
}
echo $row['number'];
}
}
im triyng to figure out an smart way to solve this problem. In my MySql server i have this simplified table:
As you can see, this is an statistic table, im interested on ploting how many visits (profile_visit under the type colunm) recived independently of the ip by each day from the begining of the data, that means that i have to query something like WHERE type=profile_visit AND user_url=xxx. But, this gives me a bunch of rows representing each visit made.
The question is, how can i use this raw data retrived from the query to obtain an array with the total visits by day (i dont care about time)?
Im using PHP, is a good idea to make the adaptation using a php script or it can be done using just MySQL querys?
If i reach the array with the total visits by day i can just simple adapt the format i need by:
$result=mysql_query("SELECT * FROM ".table_stats." WHERE user_url='xxx' AND type='profile_visit'");
echo "data.addRows([";
while($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
$salida = $salida . "['".$row['date']."', $row['total']],";
}
$salida = rtrim($salida, ",");
echo $salida . "]);";
Thanks for any help and orientation about this.
You can easily do this directly from SQL that will run faster than retrieving the information from the DB and then processing it with php. The query should look like:
SELECT datetime, COUNT(id_stat) as numVisits WHERE type="type_profile" AND user_url = "xxx" GROUP BY DATE(DATE_SUB(datetime, INTERVAL 1 DAY))
This will return the number of visits (numVisits) grouped by day, and the lowest datetime recorded that day.
I do not know if you want to display the information just showing the day. If so, you will need to use php to modify the string provided by the DB.
Using your example the result of the query is:
datetime | numVisits
2011-11-10 12:05:44 | 9
2011-11-12 20:06:06 | 3
...
Is this what your after?
SELECT `tbl`.`ip`,COUNT(*) AS `visits` FROM `tbl` WHERE `tbl`.`type` = 'profile_visit' GROUP BY `tbl`.`ip` ORDER BY `visits`
This will return two columns, one with the IP and the other with the respective number of visits (all of type 'profile_vist').
UPDATE
Sorry, I read your question to quickly and missed the time parameter, you'll need something along the line:
SELECT
`tbl`.`ip`,
DATE(`tbl`.`date`) AS `date`,
COUNT(*) AS `visits`
FROM `tbl`
WHERE `tbl`.`type` = 'profile_visit'
AND DATE(`tbl`.`date`) = 'date'
GROUP BY `tbl`.`ip`
ORDER BY `visits`
This will give you a summary on the specific date. If you don't need the IP, remove it from the SELECT-list and GROUP BY(DATE(tbl.date)) instead. To optimize, consider using DATE instead of DATETIME to avoid casting between the two (or adding an additional column).
We currently have some php code that allows an end user to update records in our database by filling out a few fields on a form. The database has a field called sequentialNumber, and the user can enter a starting number and ending number, and an update query runs in the background on all records with a sequentialNumber between the starting and ending numbers. This is a pretty quick query.
We've been running into some problems lately where people are trying to update records that don't exist. Now, this isn't an issue on the database end, but they want to be notified if records don't exist, and if so, which ones don't exist.
The only way I can think of to do this is to run a select query in a loop:
for ($i=$startNum; $i<=$endNum; $i++) {
//run select query: select sequentialNumber from myTable where sequentialNumber = $startNum;
}
The problem is, our shared host has a timeout limit on scripts, and if the sequentialNumber batch is large enough, the script will time out. Is there a better way of checking for the existence of the records before running the update query?
EDIT:
It just occurred to me that I could do another kind of test: get the number of records they're trying to update ($endNum - $startNum), and then do a count query:
select count(sequentialNumber) where sequentialNumber between $startNum and $endNum
If the result of the query is not the same as the value of the subtraction, then we know that all the records aren't there. It wouldn't be able to tell us WHICH ones aren't there, but at least we'd know something wasn't as expected. Is this a valid way to do things?
You could try
select sequentialNumber from myTable where sequentialNumber between $startNum and $endNum
This will return all known numbers in that range. Then you can use the array_search function to find out if a certain number is known or not. This should be faster than doing a lot of queries to the db.
var count = mysql_fetch_array(mysql_query('SELECT count(*) FROM x WHERE id>startNum and id<endNum'));
var modified = mysql_affected_rows(mysql_query('UPDATE x SET col='value' WHERE id>startNum and id<endNum'));
if (count[0] > modified) {
// panic now
}
OK - I'll get straight to the point - here's the PHP code in question:
<h2>Highest Rated:</h2>
<?php
// Our query base
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
?>
<h3><?php echo $row->title." ID: ";echo $row->id; ?></h3>
<p class="author"><?php $query2 = $this->db->query("SELECT email FROM users WHERE id = ".$row->author);
echo $query2->row('email');?></p>
<?php echo ($this->bbcode->Parse($row->code)); ?>
<?php } ?>
Sorry it's a bit messy, it's still a draft. Anyway, I researched ways to use a Ratings system - previously I had a single 'rating' field as you can see by SELECT * FROM code ORDER BY rating DESC. However I quickly realised calculating averages like that wasn't feasible, so I created five new columns - rating1, rating2, rating3, rating4, rating5. So when 5 users rating something 4 stars, rating4 says 5... does that make sense? Each ratingx column counts the number of times the rating was given.
So anyway: I have this SQL statement:
SELECT id, (ifnull(rating1,0) + ifnull(rating2,0) + ifnull(rating3,0) + ifnull(rating4,0) + ifnull(rating5,0)) /
((rating1 IS NOT NULL) + (rating2 IS NOT NULL) + (rating3 IS NOT NULL) + (rating4 IS NOT NULL) + (rating5 IS NOT NULL)) AS average FROM code
Again messy, but hey. Now what I need to know is how can I incorporate that SQL statement into my script? Ideally you'd think the overall query would be 'SELECT * FROM code ORDER BY (that really long query i just stated) DESC' but I can't quite see that working... how do I do it? Query, store the result in a variable, something like that?
If that makes no sense sorry! But I really appreciate the help :)
Jack
You should go back to the drawing board completely.
<?php
$query = $this->db->query("SELECT * FROM code ORDER BY rating DESC");
foreach($query->result() as $row) {
$this->db->query("SELECT email FROM users WHERE id = ".$row->author;
}
Anytime you see this in your code, stop what you're doing immediately. This is what JOINs are for. You almost never want to loop over the results of a query and issue multiple queries from within that loop.
SELECT code.*, users.email
FROM code
JOIN users ON users.id = code.author
ORDER BY rating DESC
This query will grab all that data in a single resultset, removing the N+1 query problem.
I'm not addressing the rest of your question until you clean up your question some and clarify what you're trying to do.
if you would like to change your tables again, here is my suggestion:
why don't you store two columns: RatingTotal and RatingCount, each user that rates it will increment RatingCount by one, and whatever they vote (5,4,4.2, etc) is added to RatingTotal. You could then just ORDER BY RatingTotal/RatingCount
also, I hope you store which users rated each item, so they don't vote multiple times! and swing the average their way.
First, I'd decide whether your application is write-heavy or read-heavy. If there are a lot more reads than writes, then you want to minimize the amount of work you do on reads (like this script, for example). On the assumption that it's read-heavy, since most webapps are, I'd suggest maintaining the combined average in a separate column and recalculating it whenever a user adds a new rating.
Other options are:
Try ordering by the calculated column name 'average'. SQL Server supports this. . not sure about mysql.
Use a view. You can create a view on your base table that does the average calculation for you and you can query against that.
Also, unrelated to your question, don't do a separate query for each user in your loop. Join the users table to the code table in the original query.
You should include it in the SELECT part:
SELECT *, (if ....) AS average FROM ... ORDER BY average
Edit: assuming that your ifnull statement actually works...
You might also want to look into joins to avoid querying the database again for every user; you can do everything in 1 select statement.
Apart from that I would also say that you only need one average and the number of total votes, that should give you all the information you need.
Some excellent ideas, but I think the best way (as sidereal said that it's more read heavy that write heavy) would be to have columns rating and times_rated, and just do something like this:
new_rating = ((times_rated * rating) + current_rating) / (times_rated + 1)
current_rating being the rating being applied when the person clicks the little stars. This simply weights the current user's rating in an average with the current rating.