Selection of data of the big table MySQL - php

I have a big table with customers, t_customer that has 10.000.000 records.
I start a certain PHP script which chooses data from this table and I need to execute an action on each customer.
But as I progress through the data, the SQL inquiry runs more and more slowly, and now terminates with Query execution was interrupted.
My query is:
SELECT id, login FROM t_customer WHERE regdate<1370955715 LIMIT 2600000, 100000;
So the limit doesn't have any effect any more and I don't know what to do about this.
P.S.
SELECT id, login FROM t_customer WHERE regdate<1370955715 LIMIT 2600000, 10;
the above query is executed 30 seconds
P.S.S.
The same result even without a WHERE clause

So you are selecting 100K records in PHP? That is a bad idea.
Lower your batch size to 1K, paginate through your target set and then see how it goes. Make sure you have an index on the regdate too. 100K arrays in PHP are... complicated.
PHP is a scripting language, it's not really C++ :) That's why I write background heavy-lifting workers in C++.

MySQL offers a really clever feature called partitions. http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
It allows you to automatically split huge data sets into smaller files giving you a huge improvement while doing operations on your data. Just like RAID but for SQL :P There was an excellent post on SO on the best configuration for partitions, but can't find it at the moment.
If you improve the performance of your query, PHP should instantly have more time to smash through your loops and arrays.

Related

Improve mysql Speed?

I have mysql/php script running on my linux machine. Its basically migrating files content to MySql table. there are about 4400,000 Files, Account files each file`s content is places in a table in one row. It have been 14 hours and so far it have only done 300,000 Accounts.
At first it was very fast and was doing about 1000 files a second now it's slowed down to 50 files per second and the mysql process is consuming 95% of server CPU.
Although The machine have multiple cores and I was thinking if its possible to allocate more then one core to mysql process which is consuming 95% of CPU.
Or is there any other way to make the process faster?
Thank you.
here is the script
https://paste.ee/p/LZwlH#GHxpgqiUUPsVQFchdKVny2DEJQxaXH9V
Do not use the mysql_* API. Switch to mysqli_* or PDO.
Please provide these:
SHOW CREATE TABLE
SHOW VARIABLES LIKE '%buffer%';
select * from players where p_name=' -- there is no need to select *, simply SELECT 1. Do you have an index on p_name? That is very important.
It smells like index updating, but can't be sure.
One way to speed up inserts is to 'batch' them -- 100 rows at a time will typically run 10 times as fast as 1 at a time.
Even better might be to use LOAD DATA. You may need to load into a temp table, then massage things before doing INSERT .. SELECT .. to put the data into the real table.
Temporarily remove the INSERT from the procedure. See how fast it runs. (You have not 'proven' that INSERT is the villain.)

Scale-able chat-room using PHP/MySQL?

Forgive the lack of a question.
I'm trying to build a website with the same functions as a chat room. The idea of 5-50 viewers in each room (with thousands of rooms) is very real, only about 1% of the room would be chatting.
I've had some ideas, but everything I've come up with seems like it would require a crazy amount of processing power... What would be an efficient way to do this?
There are specific programs designed for this purpose (ircd, see http://www.atheme.org/project/charybdis and similar.) However, if you really wish to reinvent the wheel, you will likely want a hosting solution that has a decent amount of physical RAM, and shared memory extensions (ex: APC.)
Shared memory functionality (APC in this case) will be the fastest way to keep everyone's conversations in sync, without the hard drive spinning up too much or otherwise MySQL spiraling out of control. You should be able to accommodate hundreds of concurrent requests this way without the server breaking a sweat, since it doesn't tax MySQL. It reads almost directly off the RAM chips.
You can key-store individual channels for conversations (ex: "channel-#welcome") and poll them directly via AJAX. See apc_store, apc_add and apc_fetch for more information.
Even if you end up storing conversations in MySQL for whatever reason, it's still preferable to use some kind of memory caching for reading, since that takes tremendous load off of the database server.
If you do it this way, it's best to make your databases innodb, since they won't lock during writes. Using APC, your limiting reagent will be amount of RAM and length of conversations that you intend to keep in shared buffer.
You've asked a really broad question, but:
Store each message as a row in your database, use AJAX to reload the chat window content with the last few messages e.g.
SELECT * FROM `chat_messages` WHERE `room_id` = 'ID' ORDER BY `id` DESC LIMIT 100
Will select the 100 most recent messages for the chat room. Loop over the results and display all the messages as you want.
If your database user has permissions to create tables, you could also dynamically create a table for each chat room (which would be a lot faster performance wise)
You'd then simply have an input or textarea in a form, that when submitted, inserts a new row to the database (which will show up to everyone next time the chat window is reloaded).
Another, more optimised way to do it would be to only return new messages to users each query, by storing the timestamp of each message in the database, and storing the timestamp of the last request locally in JavaScript, then use a query like:
SELECT * FROM `chat_messages` WHERE `room_id` = 'ID' AND `timestamp` > 'LAST_REQUEST' ORDER BY `id` DESC LIMIT 100
Then appending the result to the chat window, rather than replacing it.

Using PHP to optimize MySQL query

Let's assume I have the following query:
SELECT address
FROM addresses a, names n
WHERE a.address_id = n.address_id
GROUP BY n.address_id
HAVING COUNT(*) >= 10
If the two tables were large enough (think if we had the whole US population in these two tables) then running an EXPLAIN on this SELECT would say that Using temporary; Using filesort which is usually not good.
If we have a DB with many concurrent INSERTs and SELECTs (like this) would delegating the GROUP BY a.address_id HAVING COUNT(*) >= 10 part to PHP be a good plan to minimise DB resources? What would the most efficient way (in terms of computing power) to code this?
EDIT: It seems the consensus is that offloading to PHP is the wrong move. How then, could I improve the query (let's assume indexes have been created properly)? More sepcifically how do I avoid the DB from creating a temporary table?
So your plan to minimize resources is by sucking all the data out of the database and having PHP process it, causing extreme memory usage?
Don't do client-side processing if at all possible - databases are DESIGNED for this sort of heavy work.
Offloading this to PHP is probably the opposite direction you want to go. If you must do this on a single machine then the database is likely the most efficient place to do it. If you have a bunch of PHP machines and only a single DB server, then offloading might make sense, but more likely you'll just clobber the IO capability of the DB. You'll probably get a bigger win by setting up a replica and doing your read queries there. Depending on your ratio of SELECT to INSERT queries, you might want to consider keeping a tally table (many more SELECTs than INSERTs). The more latency you can allow for your results, the more options you have. If you can allow 5 minutes latency, then you might start considering a distributed batch processing system like hadoop rather than a database.

Is it possible to do count(*) while doing insert...select... query in mysql/php?

Is it possible to do a simple count(*) query in a PHP script while another PHP script is doing insert...select... query?
The situation is that I need to create a table with ~1M or more rows from another table, and while inserting, I do not want the user feel the page is freezing, so I am trying to keep update the counting, but by using a select count(\*) from table when background in inserting, I got only 0 until the insert is completed.
So is there any way to ask MySQL returns partial result first? Or is there a fast way to do a series of insert with data fetched from a previous select query while having about the same performance as insert...select... query?
The environment is php4.3 and MySQL4.1.
Without reducing performance? Not likely. With a little performance loss, maybe...
But why are you regularily creating tables and inserting millions of row? If you do this only very seldom, can't you just warn the admin (presumably the only one allowed to do such a thing) that this takes a long time. If you're doing this all the time, are you really sure you're not doing it wrong?
I agree with Stein's comment that this is a red flag if you're copying 1 million rows at a time during a PHP request.
I believe that in a majority of cases where people are trying to micro-optimize SQL, they could get much greater performance and throughput by approaching the problem in a different way. SQL shouldn't be your bottleneck.
If you're doing a single INSERT...SELECT, then no, you won't be able to get intermediate results. In fact this would be a Bad Thing, as users should never see a database in an intermediate state showing only a partial result of a statement or transaction. For more information, read up on ACID compliance.
That said, the MyISAM engine may play fast and loose with this. I'm pretty sure I've seen MyISAM commit some but not all of the rows from an INSERT...SELECT when I've aborted it part of the way through. You haven't said which engine your table is using, though.
The other users can't see the insertion until it's committed. That's normally a good thing, since it makes sure they can't see half-done data. However, if you want them to see intermediate data, you could throw in an occassional call to "commit" while you're inserting.
By the way - don't let anybody tell you to turn autocommit on. That a HUGE time waster. I have a "delete and re-insert" job on my database that takes 1/3rd as long when I turn off autocommit.
Just to be clear, MySQL 4 isn't configured by default to use transactions. It uses the MyISAM table type which locks the entire table for each insert, if I remember correctly.
Your best bet would be to use one of the MySQL bulk insertion functions, such as LOAD DATA INFILE, as these are dramatically faster at inserting large amounts of data. As for the counting, well, you could break the inserts into N groups of 1000 (or Y) then divide your progress meter into N sections and just update it on each group's request.
Edit: Another thing to consider is, if this is static data for a template, then you could use a "select into" to create a new table with the same data. Not sure what your application is, or the intended functionality, but that could work as well.
If you can get to the console, you can ask various status questions that will give you the information you are looking for. There's a command that goes something like "SHOW processlist".

what is the best way to get an approximate number of search results from a query?

to describe it some more, if i have an image map of a region that when clicked, a query is performed to get more information about that region.my client wants me to display an approximate number of search results while hovering over that region image map.my problem is how do i cache? or get that number without heavily exhausting my server's memory resources?
btw im using php and mysql if that's a necessary info.
You could periodically execute the query and then store the results (e.g., in a different table).
The results would not be absolutely up-to-date, but would be a good approximation and would reduce the load on the server.
MySQL can give you the approximate number of rows that would be returned by your query, without actually running the query. This is what EXPLAIN syntax is for.
You run the query with 'EXPLAIN' before the 'SELECT', and then multiply all the results in the rows column.
The accuracy is highly variable. It may work with some types of queries, but be useless on others. It makes use of statistics about your data that MySQL's optimizer keeps.
Note: using ANALYZE TABLE on the table periodically (ie once a month) may help improve the accuracy of these estimates.
You could create another table with the id's of the regions, and then create a script that runs over all the regions at a slow time (night for example) and populates this extra table with the data.
Then when you hover you get the ID from this new table, and its at most a day old.
The issue with that could be that you do not have a slow time or that the run-through that is done at night takes a long time and is very process heavy.
EDIT to your comment.
Take a large region or country if you can, do a query of that region within your SQL browser of choice and check out the time it would take.
If it is to much you could distribute it so certain countries will execute at certain hours, small countries together and large countries alone at some period.
Is your concern that you don't want to go to the database for this information at all when the roll-ever event occurs, or is it that you think the query will be too slow and you want the information from the database, but faster?
If you have a slow query, you can tune it, or use some of the other suggestions already given.
Alternatively, and to avoid a database hit altogether, it seems that you have a finite number of regions on this map, and you can run a query periodically for all regions and keep the numbers in memory.

Categories