i was wonder if anyone could point me in the right direction . i was wondering which is faster ..... i have a situation on a game im creating were there are over 630,000 combinations and i wanted to know is if my script is going to seek the database for one result , would be quicker then say a large switch statement? .... this game im creating is (im hoping) suppose to be a lightwight hit and i dnt want any problems
<?php
// is this quicker
mysql_query(....) - meanwhile remeber this table should have anywere from 600,000-630,000 rows
// or is this quicker
switch{
case --
....
case--
....
were here this will be in one page with anywwhere from 600,000 - 630,000 different case's ?
}
?>
php will take ages to parse the page, which will probably make it slower regardless of execution time, which may or may not be slower (thought I'd expect query to be faster here as well). You may also consider associative array instead of switch, if query can do that, but it won't make parsing much faster. And think of memory consumption too.
And you can just try.
I would almost certainly go with the MySQL query, especially if the table(s) are indexed correctly. The switch method will be very hard to maintain - for example, what if someone other than you needed to add a new combination to the switch?
PHP doesn't use switch tables to optimize case selections. It would be the equivalent of a gigantic if-elseif-else... statement.
It's better to use queries to properly indexed tables for this.
Ok, just read that, take it or leave it (don't know what your game looks like).
If that is a high traffic (many users) game/website, I would split data in at least two tables, one holding just ID, some GROUP information and what you call COMBINATION. All other additional data would then be in the second table, accessable over a JOIN to that ID.
table 1
ID | GROUP | COMBINATION
1 | island | ABCDE
2 | house | FGHIJ
table2
ID | MORE INFO
1 | ...
2 | ...
Also I would (if possible) split those GROUPs into table chunks.
// ok, this is an example for ID and range of IDs, but I think you can get it
partition by range (id)
(
PARTITION P1 VALUES LESS THAN (10),
PARTITION P2 VALUES LESS THAN (20)
)
Logical Splitting:
- No need to create separate tables
- no need to move chunks of data across files
4 main reasons to use partitions:
- to make single inserts and selcts faster
- to make range selects faster
- to help split the data across different paths
- to store historical data efficiently
- if you need to delete large chunks of data instantly
Related
Whats the best way to achieve best performance for getting data from multiple table?
I have these following tables
Applicants ( 50,101 rows )
-id
-first_name
-email
Phones ( 50,151 rows )
-id
-number
-model_id
-model_type
Address (100,263 rows)
-id
-state
-model_id
-model_type
Business (26 rows)
-id
-company
-model_id
-model_type
My desired result
id | first_name | email | number | company | state
----+------------+-------+--------+---------+------
1 | test | - | - | - | -
Im using SQLyog to perform this query below and its very slow, I have thousands of data on these tables
SELECT `app`.`id`,`app`.`first_name`, `app`.`email`, `p`.`number`, `b`.`company`, `add`.`state`
FROM `applicants` AS `app`
LEFT JOIN phones AS `p` ON `app`.`id` = `p`.`model_id`
AND `p`.`model_type` = 'App\\Models\\Applicant'
LEFT JOIN `businesses` AS `b` ON `app`.`id` = `b`.`model_id`
AND `b`.`model_type` = 'App\\Models\\Applicant'
LEFT JOIN `addresses` AS `add` ON `app`.`id` = `add`.`model_id`
AND `b`.`model_type` = 'App\\Models\\Applicant'
LIMIT 10
summary, takes 25.794 to finish
Execution Time : 25.792 sec
Transfer Time : 0.001 sec
Total Time : 25.794 sec
What would be the best way to achieve my goal? like should a perform a separate multiple query for each
phone, business and address? though Im not sure how to achieve my desired result with multiple query
It really depends on the specific situation what will be faster. Also you can probably optimize your situation by creating the proper indexes. For example if you query a lot by model_id and model_type, you could create an index on either, or both of the fields.
I would suggest running the query with joins with an EXPLAIN in front, i.e. EXPLAIN SELECT {your query}. That will give you some insights on how MySQL executes your query. Then you can try the same with separate queries. Then add indexes and see if they are used. Then choose the best performing solution.
About indexes:
Introduction: https://www.mysqltutorial.org/mysql-index/mysql-create-index/
More in-depth: https://use-the-index-luke.com/
From my understanding, there is no general answer on this question.
One of the possible solutions is denormalisation.
Creating extra table with all the data you need periodically.
It helps a lot in some cases but, unfortunately, just not possible to do this way in other cases.
Here are the things to think about in this case. Arguments for a single query:
Databases are designed to handle complicated queries, so the JOINs are probably faster in the database.
Each query incurs overhead for moving the query into the database and data out of the database.
In favor of a multiple queries:
Query optimizers are not perfect, so they might come up with the wrong plan.
Returning data as a single result set often requires a "wide" format for the data with many repeated columns. Returning more data is slower.
In general, the balance is on the single query, but that is not always true. For instance, if the database is fast but the bandwidth to the application is slow, the last bullet may consistently be the dominating factor.
I am surprised that your query takes so much time. You don't have an ORDER BY or GROUP BY so the time to the first result should be pretty fast. You might be able to have it run faster by simply doing a subselect on app:
FROM (SELECT app.*
FROM `applicants` `app`
LIMIT 10
) app . . .
Joining tables is taking all of the data from each table and combining it into one. It is expected to have a slower loading time because of this. If you are only gathering specific data from each table then running seperate queries might be the better idea. It depends on how your application is laid out and how you are using this information.
I have 2 tables
1. First table contains prospects, their treatment status and the mail code they received (see it as a foreign key)
2. Second table contains mails, indexed with email code
I need to display some charts about hundreds of thousands prospects so I was thinking about an aggregate query (get prospect data group by month, count status positive, count status negative, between start and end date, etc)
Result is pretty short and simple, and I can use it directly in charts :
[ "2019-01" => [ "WON" => 55000, "LOST" => 85000, ...],
...
]
Then I was asked to add a filter with mails (code and human label) so user would chose it from a multi select field. I can handle writting the query(ies), but I am wondering about which way I should use.
I got a choice between:
- keeping my first query and do a second one (distinct values of mail, same conditions)
- query everything and treat all my rows with PHP
I know coding but I have little knowledge about performance.
In theory I should not use 2 queries about same data but treating all those lines with php when mysql can do it better, looks like ... "overkill".
Is there a best practice ?
I have a lot of PHP pages that have dozens of queries supporting them, and they run plenty fast. When a page does not run fast, I focus on the slowest query; I do not on playing games in PHP. But I avoid running a query that hits hundreds of thousands of rows; it will be "too" slow. Some things...
Maybe I will find a way to aggregate the data to avoid a big scan.
Maybe I will move the big query to a second page -- this avoids penalizing the user who does not need.
Maybe I will break up the big scan so that the user must ask for pieces, not build a page with 100K lines. Pagination is not good for that many rows. So...
Maybe I will dynamically build an index into a second level of pages.
To discuss this further, please provide SHOW CREATE TABLE, some SELECTs (not worrying about how bad they are; we'll tell you), and mockups of page(s).
When storing relationship data for a user (potentially a thousand friends per user), would it be faster to create a new row for each relationship, or to concatenate all of of their friends into a string and then parse that later?
I.e.
Primary id | Friend1ID | Friend2ID|
1| 234| 5789|
2| 5789| 234|
Where the IDs are references to primary IDs in a 'Users' table.
Or for the 'Users' table to just have a column called friends which may look like this:
Primary id | Friend1ID |
234| 5789.123.8474|
5789| 234|
I'm of the understanding that string concatenation and parsing is generally quite slow, so I'd be tempted to lean towards the first method. However as the number of users grows, this then becomes a case of selecting one row and parsing it V searching millions of rows for rows which match the WHERE criteria.
Is one method distinctly faster than the other? Particularly as the number of users grows.
You should use a second table to store the friends.
Users Table
----------
userid | username
1 | Bob
2 | Mike
3 | John
Users Friends Table
--------------------
userid | friend_id
1 | 2
3 | 2
Here you can see that Mike is friends with both Bob and John.... This is of course a very simply demonstration.
Your second option will not scale, some people may have hundreds of thousands of friends, storing each Id in a single field is going to cause a headache further down the line. adding friends, removing friends. working out complex relationships between people. Lots of over head.
Querying millions of records with a WHERE clause on a properly indexed table should take no more than a second, the first option is the better one.
The "correct" way would probably be keeping multiple rows. This allows for much easier statistical analysis and more complex queries (like friends of friends) without any hacky stuff. Integer storage size is also often smaller than string storage, even though you're repeating one ID - especially if you use an appropriately sized integer store (like mediumint).
It's also more maintainable, scalable (if they start getting a damn lot of friends) export and importable. The speed gain from concatenation, if any, wouldn't be worth the rest of the benefits.
If you wanted for instance to search if Bob was a friend of Jane, this would be a single row lookup in the multiple row implementation, or in the single row implementation: get Bob's row, decode field, loop through field looking for Jane - found Jane. DBMS optimisation and indexing would make the multiple row implementation much faster in this case - if you had the primary key as (id, friendid) then it'd be pretty much instantaneous as the table would probably be hashed on that key.
I believe the proper way to do it which might be more faster is two do a two columns table
user | friend
1 | 2
1 | 3
It will simple and will make queering and updating much easier and you can have as many relationship as you want.
Don't over complicate the problem...
... Asking for the more "correct" way is wrong itself.
It depends based on case.
If you have low access rate to your web application having more rows won't change anything on the other side of the coins (i'm not English), on large and medium application access it's maybe better to have the minimal access to the db possible.
To obtain this as you've already thinked you can concatenate the values and then split them on login of the user and then put everything into the $_SESSION supervar.
At least this is what i think.
I have a table that stores specific updates for all customers.
Some sample table:
record_id | customer_id | unit_id | time_stamp | data1 | data2 | data3 | data4 | more
When I created the application, I did not realize how much this table would grow -- currently I have over 10mil records within 1 month. I am facing issues, when php stops executing due to amount of time it takes. Some queries produce top-1 results, based on the time_stamp + customer_id + unit_id
How would you suggest handling this type of issues? For example, I can create new table for each customer, although I think it does not a good solution.
I am stuck with no good solution in mind.
If you're on the cloud (where you're charged for moving data between server and db), ignore.
Move all logic to the server
The fastest query is a SELECT WHEREing the PRIMARY. It won't matter how large your database is, it will come back just as fast with a table of 1 row (as long as your hardware isn't unbalanced).
I can't tell exactly what you're doing with your query, but first download all of the sorting and limiting data into PHP. Once you've got what you need, SELECT the data directly WHEREing on record_id (I assume that's your PRIMARY).
It looks like your on demand data is pretty computationally intensive and huge, so I recommend using a faster language. http://blog.famzah.net/2010/07/01/cpp-vs-python-vs-perl-vs-php-performance-benchmark/
Also, when you start sorting and limiting on the server rather than the db, you can start identifying shortcuts to speed it up even further.
This is what the server's for.
I suggest you use partitioning of your data following some criteria.
You can make horizontal or vertical partition of your data.
For example group your customer_id in 10 partitions, using his id module 10.
So, customer_id terminated in 0 goes to partition 0, with ended in 1 goes to partition 1
MySQL can make this for you easily.
What is the count of records within the tables? Often, with relational databases, it's not how much data you have (millions are nothing to relational databases), it's how you're retrieving it.
From the look of your select, in fact, you probably just need to optimize the statement itself and avoid the multiple subselects, which is probably the main cause of the slowdown. Try running an explain on that statement, or just get the ids and run the interior select individually on the ids of the records that you've actually found & retrieved in the first run.
Just the fact that you have those subselects within your overall statement means that you haven't optimized that far into the process anyway. For example, you could be running a nightly or hourly cron job that aggregates into a new table the sets like the one created by SELECT gps_unit.idgps_unit, and then you can run your selects against a previously generated table instead of creating blocks of data that are equivalent of a table on the fly.
If you find yourself unable to effectively optimize that select statement, you have "final" options like:
Categorize via some criteria and split into different tables.
Keep a deep archive, such that anything past the first year or so is migrated to a less used table and requires special retrieval.
Finally, if you have so much small data, you may be able to completely archive certain tables and keep them around in file form only and then truncate past a certain date. Often with web tracking data that isn't that important and is kinda spammy, I end up doing this after a few years, when the data is really not going to do anyone any good any more.
I would like to build a website that has some elements of a social network.
So I have been trying to think of an efficient way to store a friend list (somewhat like Facebook).
And after searching a bit the only suggestion I have come across is making a "table" with two "ids" indicating a friendship.
That might work in small websites but it doesn't seem efficient one bit.
I have a background in Java but I am not proficient enough with PHP.
An idea has crossed my mind which I think could work pretty well, problem is I am not sure how to implement it.
the idea is to have all the "id"s of your friends saved in a tree data structure,each node in that tree resembles one digit from the friend's id.
first starting with 1 node, and then adding more nodes as the user adds friends.
(A bit like Lempel–Ziv).
every node will be able to point to 11 other nodes, 0 to 9 and X.
"X" marks the end of the Id.
for example see this tree:
An Example
In this tree the user has 4 friends with the following "id"s:
0
143
1436
15
Update: as it might have been unclear before, the idea is that every user will have a tree in a form of multidimensional array in which the existence of the pointers themselves indicate the friend's "id".
If every user had such a multidimensional array, searching if id "y" is a friend of mine, deleting id "y" from my friend list or adding id "y" to my friend list would all require constant time O(1) without being dependent on the number of users the website might have, only draw back is, taking such a huge array, serializing it and pushing it into each row of the table just doesn't seem right.
-Is this even possible to implement?
-Would using serializing to insert that tree into a table be practical?
-Is there any better way of doing this?
The benefits upon which I chose this is that even with a really large number of ids (millions or billions) the search,add,delete time is linear (depends of the number of digits).
I'd greatly appreciate any help with implementing this or any suggestions for alternative ways to improve or change this method.
I would strongly advise against this.
Storage savings are not significant, and may (probably?) be worse. In a real dataset, the actual space-savings afforded to you with this approach are minimal. Computing the average savings is a very difficult problem, but use some real numbers and try a few samples with random IDs. If you have a million users, consider a user with 15 friends. How much data do you save with this approch? You may actually use more space, since tree adjacency models can require significant data.
"Rendering" a list of users requires CPU investment.
Inserts are non-deterministic and non-trivial. When you add a new user to an existing tree, you will have a variety of methods of inserting them. Assuming you don't choose arbitrarily, it is difficult to compute which approach is the best (and would only be based on heuristics).
This are the big ones that came to my mind. But generally, I think you are over-thinking this.
You should check out OQGRAPH, the Open Query graph storage engine. It is designed to handle efficient tree and graph storage for MySQL.
You can also check out my presentation Models for Hierarchical Data with SQL and PHP, or my answer to What is the most efficient/elegant way to parse a flat table into a tree? here on Stack Overflow.
I describe a design I call Closure Table, which records all paths between ancestors and descendants in a hierarchy.
You say 'using PHP' in the title, but this seems to be just a database question at its heart. And believe it or not the linking table is by far the best way to go. Especially if you have millions or billions of users. It would be faster to process, easier to handle in the PHP code and smaller to store.
Update
Users table:
id | name | moreInfo
1 | Joe | stuff
2 | Bob | stuff
3 | Katie | stuff
4 | Harold | stuff
Friendship table:
left | right
1 | 4
1 | 2
3 | 1
3 | 4
In this example Joe knows everyone and Katie knows Harold.
This is of course a simplified example.
I'd love to hear if someone has a better logic to the left and right and an explanation as to why.
Update
I gave some php code in a comment below but it was marked up wrong so here it is again.
$sqlcmd = sprintf( 'SELECT IF( `left` = %1$d, `right`, `left`) AS "friend" FROM `friendship` WHERE `left` = %1$d OR `right` = %1$d', $userid);
Few ideas:
ordered lists - searching through ordered list is fast, though ordering itself might be heavier;
horizontal partitioning data;
getting rid of premature optimizations.