How to speed up thousands of php loop using php jquery - php

Recently I have develop an apps for processing student result. The algorithm is below:
Algorithm:
get list of all students from mysql
if found
get list of subject from mysql
get sum of marks from mysql for each student and subject
calculate pass,fail,grade point etc.
insert total marks into another table
//end subject list
error msg if no student
// end
this takes huge time.
now I like to speed up with jQuery.
For example,
load all student id and subject id into jQuery array. then send a request to PHP to process result using jQuery loop or any other way.

What you are saying does not make much sense. However, from your algorithm description above it looks like you are selecting students, subjects, marks in loops and hitting your DB for each student's subject's marks. Instead, use table joins (left join, inner join, etc.).
SELECT s.*, sb.*, SUM(m.value)
FROM students AS s
LEFT JOIN subjects as sb ON s.id = sb.student_id
INNER JOIN marks AS m ON sb.id = m.subject_id
GROUP BY s.id, sb.id
With this query you are replacing your 1000 queries with one in database calculation query. MySql will run the query in more optimised manner and require less resources.

It's pretty hard to say what is your real problem, but it sounds like first thing you should do is to avoid those loops. Maybe you could move part of this logic to the database using some stored procedures?
Having loads of database shots may affect performance significantly.

Related

MySQL performance issue with large tables

I've been asked to develop a web software able to store some reading data from heat metering device and to divide the heat expenses among all the flat owner. I chose to work in php with MySQL engine MyISAM.
I was not used to work with large data, so i simply created a logical database where we have:
a table for building, with an id as primary key indexed (now we have ~1200
buildings in the db)
a table with all the flats in all the buildings, with an id as primary key indexed and the building_id to link to the building (around 32k+ flats in total)
a table with all the heaters in all the flats, with an id as primary key indexed and the flat_id to link to the flat (around 280k+ heaters)
a table with all the reading value, with the timestamp of the reading, an id as primary key and the heater_id to link to the heater (around 2.7M+ reading now)
There is also a separate table, linked to the building, where are stored the starting date and the end date between which the division of expenses have to be done.
When it is necessary to get all the data from a building, the approach i used is to get raw data from DB with single query, elaborate in php, than make the next query.
So here is roughly the operation sequence i used:
get the starting and end date from the specific table with a single query
store the dates in a php variable
get all the flats of the building: SELECT * FROM flats where building_id=my_building_id
parse all the data in php with a php while cycle
on each step of the while cycle i make a query getting all the heaters of that specific flat: SELECT * FROM heaters where flat_id=my_flat_id
parse all the data of the heaters with a php while cycle
on each step of this inner while cycle i'll get the last reading value of that specific heater: SELECT * FROM reading_values where heater_id=my_heater_id AND data<my_data
Now the problem is that i have serious performance issue.
Before someone point it out, i cannot get only reading value jumping all the first 6 steps of the list above, since i need to print bills and on each bill i have to write all flat information and all heaters information, so i have to get all the flats and heaters data anyway.
So I'd like some suggestions on how to improve script performance:
all the tables are indexed, but i have to add some index somewhere else?
would using a single query with subquery instead of several one among php code improve performance?
any other suggestions?
I haven't inserted specific code as i think it would have made the question too heavy, but if asked i could insert some.
Some:
Don't use 'SELECT *' if you can avoid it -> Just get the fields you really need
I didn't test it in your particular case, but usually a single query which joins all three tables should achieve much better performance rather than looping through results with php.
If you need to loop for some reason, then at least use mysql prepared statements, which again should increase performance given the amount of queries :)
Hope it helps!
Regards
EDIT:
just to exemplify an alternative query, not sure if this suits your specific needs and without testing it (which probably means I forgot something):
SELECT
a.field1,
b.field2,
c.field3,
d.field4
FROM heaters a
JOIN reading_values b ON (b.heater_id = a.heater_id)
JOIN flats c ON (c.flat_id = a.flat_id)
JOIN buildings d ON (d.building_id = c.building_id)
WHERE
a.heater_id = my_heater_id
AND b.date < my_date
GROUP BY a.heater_id
EDIT 2
Following your comments, I modified the query so that it retrieves the information as you want it: Given a building id, it will list all the heaters and their newest reading value according to a given date:
SELECT
a.name,
b.name,
c.name,
d.reading_value,
d.created
FROM buildings a
JOIN flats b ON (b.building_id = a.building_id)
JOIN heaters c ON (c.flat_id = b.flat_id)
JOIN reading_values d ON (d.reading_value_id = (SELECT reading_value_id FROM reading_values WHERE created <= my_date AND heater_id = c.heater_id ORDER BY created DESC LIMIT 1))
WHERE
a.building_id = my_building_id
GROUP BY c.heater_id
It should be interesting to know how it performs in your environment.
Regards

nested mysql queries with huge tables

I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.
So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.
My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.
For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0
The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.
The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):
SELECT id, surname, name FROM authors
foreach row {
SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
echo "<option>$surname, $name ($num)</option>";
}
With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.
Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.
I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.
Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?
BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.
Your query style is very inefficient - try using a join and group structure:
SELECT
authors.id,
authors.surname,
authors.name,
COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;
EDIT
Just to clear up some issues I not explicitely said:
Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
Indices on books.id_auth and authors.id (the latter primary or unique) are assumed
EDIT 2
As #GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.
To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):
SELECT
theme.id,
theme.main,
theme.sub,
IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;
EDIT 3
Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way.
If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table).
Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS
After reading a lot about how the JOIN statement works, with the help of
useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.
I made a test using the options I had:
reply with the JOIN query with IFNULL(): 0,5 seconds
reply with the JOIN query without IFNULL(): 0,5 seconds
reply using a stored value: 0,4 seconds
That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).
Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.
update
I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:
SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC
It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.
Is there a way to have back my theme.main and theme.sub?

Multiple SELECTs vs Single Query with JOIN

Our current setup looks a bit like this.
public_entry (5.000.000 rows) → telephone_number (5.000.000 rows) → user (400.000 rows)
3 tables, the arrow to the right indicating a foreign key constraint containing a foreign key (integer) from the right table.
Now we have two "views" of the data we want to present in our web app.
displaying telephone numbers with public entries based on user attributes (e.g. only numbers from male users), a bit like a score.
displaying telephone numers with public entries based on their entry date
Each result should get a score assigned whether the number fits your needs (e.g. you look for a plumber, if the number is in you area an the related user is a plumber the telephone number should score high).
We tried several approaches on solving this problem with two scenarios.
The first approach does a SELECT with INNER JOINs over the table, like the following
SELECT ..., (...) as score
FROM public_entry pe
INNER JOIN telephone_numer tn ON tn.id = pe.numberid
INNER JOIN user u ON u.id = tn.userid WHERE ... ORDER BY score
using this query on smaller system, 1/4 of the production system performs very very well, even under load.
However when we put this query in the production system it wrecked havoc with execution times over 30 seconds.
The second approach was getting all public_entries filtered with a single SELECT on public_entry without any JOINs and iterating over them an calling a SELECT for each public_entry fetching the telephone_number and user, computing the score and discarding the results if telephone_number and user do not match our filter/interest.
Usually the second approach is never considered, because it creates over 300 queries for a single page load. Foreach'ing over results and calling SELECTs within a foreach is usually considered bad style.
However approach number two performs on the production system. Not well but does not tak more tahn 1-3 seconds, but also performs bad on the test systems.
Do you have any suggestions on where the problem might be?
EDIT:
Query
SELECT COUNT(p.id)
FROM public_entry p, fon f, user u
WHERE p.isweb = 1
AND f.hidden = 0
AND f.deleted = 0
AND f.id = p.fonid
AND u.id = f.userid
AND u.gender = "female"
This query has 3 seconds execution time.
This is just an example query. I can take out the where and it performs just a bit worse. In general if we do a SELECT COUNT() with a single INNER JOIN over the data the query blows up (30 seconds)
I don't have the magic answer you want, but here are some 'reasons' for poor performance, and some possible workarounds (with caveats).
Which of isweb, hidden, deleted, and gender are the most 'selective'? This optimizer sees them as useless and annoying. That is, if each has two values and an INDEX on that field is probably useless. Hence, it picks one table, does a full scan, then reaches into the next table, etc. Notice, in the EXPLAIN that it picked the smallest table (user) first. This is typically what the optimizer does when none of the WHERE clause looks useful.
Whether MySQL does all that work, or you do all that work is about the same amount of effort. Perhaps you can do it faster since you can have a simple associative arrays in memory, while MySQL is coded to allow for the tables to live on disk an be "cached" in RAM, block by block. But, if you don't have enough RAM to load everything in, you are stuck with MySQL.
If you actually removed "hidden" and "deleted" rows, the task would be a little faster.
Your two SELECTs do not look much alike. Are you suggesting there is a wide range of SELECTs? And you effectively need to look through most of all 3 tables to get the "score" or "count"?
Let's look at this from a Data Warehouse approach... Is some of the data "static"; that is, unchanging and could be summarized? If so, precomputing subtotals (COUNT(*)) into a summary table would let the ultimate queries be a lot faster. DW often involves subtotals by day. But it requires that these subtotals don't change.
COUNT(x) has the overhead of checking x for being NULL. Usually that is not necessary and COUNT(*) gives you what you want.
How often are you running the same SELECT? Or, at least, similar SELECTs? Do you need up-to-the-second scores? I'm fishing for running all the likely queries in the middle of the night, then using the results for 24 hours. Note that some queries can run faster by doing multiple things at once. For example, instead of two SELECTs for 'female' versus 'male', do one SELECT and GROUP BY gender.

What's the best way to sync row data in two tables in MySQL?

I have two tables, to make it easy, consider the following as an example.
contacts (has name and email)
messages (messages but also has name and email w/c needs to be synced to the contacts table)
now please, for those who are itching to say "use relational method" or foreign key etc. I know, but this situation is different. I need to have a "copy" of the name and email of the messages on the messages table itself and need to sync it from time to time only.
As per the syncing requirement, I need to sync the names on the messages with the latest names on the contacts table.
I basically have the following UPDATE SQL in a loop for all rows in Contacts table
UPDATE messages SET name=(
SELECT name FROM contacts WHERE email = '$cur_email')
WHERE email='$cur_email'
the above loops through all the contacts and is fired as many contacts as I have.
I have several looping ideas to do this as well without using internal SELECT but I just thought the above would be more efficient (is it?), but I was wondering if there's an SQL way that's more efficient? Like:
UPDATE messages SET name=(
SELECT name FROM contacts WHERE email = '$cur_email')
WHERE messages.email=contacts.email
something that looks like a join?
I think it should be more efficient
UPDATE messages m JOIN contacts n on m.email=n.email SET m.name=n.name
Ok. i figured it out now.. using JOINS on update
like:
UPDATE messages JOIN contacts ON messages.email=contacts.email
SET messages.email = contacts.email
WHERE messages.email != contacts.email
it's fairly simple!
BUT... I'm not sure if this is really the ANSWER TO MY POST, since my question is what the "BEST WAY is" in terms of efficiency..
Executing the above query on 2000 records took my system a 4second pause.. where as executing a few select , php loop, and a few update statements felt like it was faster..
hmmmmm
------ UPDATE --------
Well i went ahead and created 2 scripts to test this ..
on my QuadCore i7 Ivybridge machine, surprisingly
a single Update query via SQL JOIN is MUCH SLOWER than doing a rather multi query and loop approach..
on one side i have the above simple query running on 1000 records, where all records need updating...
script execution time was 4.92 seconds! and caused my machine to hicup for a split second.. noticed a 100% spike on one of my cores..
succeeding calls to the script (where no fields where needing update) took the same amount of time! ridiculous..
The other side, involving SELECT JOIN query to all rows needing an update, and a simple UPDATE query looped in a foreach() function in PHP..
took the script
3.45 seconds to do all the updates.. # around 50% single core spike
and
1.04 seconds on succeeding queries (where no fields where needing update)
Case closed...
hope this helps the community!
ps
This is what i meant when debating some logic with programmers who are too much into "coding standards".. where their argument is "do it on the SQL side" if you can as it is faster and more of the standard rather than crude method of evaluating and updating in loops w/c they said was "dirty" code.. sheesh.

Are database queries for everyone in a user list too much?

I am currently using MySQL and MyISAM.
I have a function of which returns an array of user IDs of either friends or users in general in my application, and when displaying them a foreach seemed best.
Now my issue is that I only have the IDs, so I would need to nest a database call to get each user's other info (i.e. name, avatar, other fields) based on the user ID in the loop.
I do not expect hundreds of thousands of users (mainly for hobby learning), although how should I do this one, such as the flexibility of placing code in a foreach for display, but not relying on ID arrays so I am out of luck to using a single query?
Any general structures or tips on what I can display the list appropriately with?
Is my amount of queries (1:1 per users in list) inappropriate? (although pages 0..n of users, 10 at a time make it seem not as bad I just realize.)
You could use the IN() MySQL method, i.e.
SELECT username,email,etc FROM user_table WHERE userid IN (1,15,36,105)
That will return all rows where the userid matches those ID's. It gets less efficient the more ID's you add but the 10 or so you mention should be just fine.
Why couldn't you just use a left join to get all the data in 1 shot? It sounds like you are getting a list, but then you only need to get all of a single user's info. Is that right?
Remember databases are about result SETS and while generally you can return just a single row if you need it, you almost never have to get a single row then go back for more info.
For instance a list of friends might be held in a text column on a user's entry.
Whether you expect to have a small database or large database, I would consider using the InnoDB engine rather than MyISAM. It does have a little higher overhead for processing than MyISAM, however you get all the added benefits (as your hobby grows) including JOIN, which will allow you to pull in specific data from multiple tables:
SELECT u.`id`, p.`name`, p.`avatar`
FROM `Users` AS u
LEFT JOIN `Profiles` AS p USING `id`
Would return id from Users and name and avatar from Profiles (where id of both tables match)
There are numerous resources online talking about database normalization, you might enjoy: http://www.devshed.com/c/a/MySQL/An-Introduction-to-Database-Normalization/

Categories