An alternative way to loop through large dataset

An alternative way to loop through large dataset - php

I have a table that contains a user_id, and an items field. The user_id is just an int with the user's id, and the items is an xml structured object in a 'text' field. I want to be able to see statistics about the player items. i.e. who has the most of some item, the average wealth of everyone, etc.
I currently have to loop through each row and then again create a SimpleXMLElement and loop thru that and filter given a specific criteria.
The structure is like this:
inventory
if I want to do a query to count all of the items with item id 332 for example, this query takes like 3-4 seconds. We expect there to be 50k+ rows(currently 28k), so if there is any other way I can speed this process up, it would be great.

what about using mysql like ?
for example
SELECT * FROM table WHERE inventory like '%<itemid>332</itemid>%';

Depending on how much you need to query this data, storing it as XML might not be the best approach; assuming that you've already decided that it is, many databases support some form of XPath queries which can be used to extract data out of XML fields. MySQL provides some support in the form of the ExtractValue function, which can be used to extract the criteria that you need in a more reliable way than simply using LIKE (e.g. in deefactorial's answer; what if there was more than one itemid in your XML?).
An example can be seen here on SO, in How to use XPATH in MySQL select?.

Related

Best way to find a match from large table

The best way to find a match between a few columns in the Data Base
I'd like to do something like this:
If you find a match to $ a, display the ID of the row
I am debating between two ways:
Select the entire table and look for a match and keep them a Data Base and then present them to from the array
Or that each time it search for matching from the table
The problem is that each time I perform a query for all the table (very large table) there is a problem with memory limit
So I'm looking for a way that takes the least memory

If all the data is in a single table, be sure that the data you are querying is indexed. This will ensure an 'optimal' search for your table.
In terms of memory, if you have an extremely large result set and slam the entire dataset into an array, you may run out of memory. To deal with this, you should page the data e.g. load some limited number results into the array for display, then present more data as the user asks for it.

Generally, selecting limited results from the database is faster and less memory intensive than populating large arrays. For a large table, be sure you only select the data you require. You might be looking for something like
SELECT record_id FROM your_table WHERE your_table.your_column = '$a' LIMIT 1;
This will only return one record in your result set.

Speed of SELECT Distinct vs array unique

I am using WordPress with some custom post types (just to give a description of my DB structure - its WP's).
Each post has custom meta, which is stored in a separate table (postmeta table). In my case, I am storing city and state.
I've added some actions to WP's save_post/trash_post hooks so that the city and state are also stored in a separate table (cities) like so:
ID postID city state
auto int varchar varchar
I did this because I assumed that this table would be faster than querying the rather large postmeta table for a list of available cities and states.
My logic also forced me to add/update cities and states for every post, even though this will cause duplicates (in the city/state fields). This must be so because I must keep track of which states/cities exist (actually have a post associated with them). When a post is added or deleted, it takes its record to or from the cities table with it.
This brings me to my question(s).
Does this logic make sense or do I suck at DB design?
If it does make sense, my real question is this: **would it be faster to use MySQL's "SELECT DISTINCT" or just "SELECT *" and then use PHP's array_unique on the results?**
Edits for comments/answers thus far:
The structure of the table is exactly how I typed it out above. There is an index on ID, but the point of this table isn't to retrieve an indexed list, but to retrieve ALL results (that are unique) for a list of ALL available city/state combos.
I think I may go with (I don't know why I didn't think of this before) just adding a serialized list of city/state combos in ONE record in the wp_options table. Then I can just get that record, and filter out the unique records I need.
Can I get some feedback on this? I would imagine that retrieving and filtering a serialized array would be faster than storing the data in a separate table for retrieval.

To answer your question about using SELECT distinct vs. array_unique, I would say that I would almost always prefer to limit the result set in the database assuming of course that you have an appropriate index on the field for which you are trying to get distinct values. This saves you time in transmitting extra data from DB to application and for the application reading that data into memory where you can work with it.
As far as your separate table design, it is hard to speculate whether this is a good approach or not, this would largely depend on how you are actually preforming your query (i.e. are you doing two separate queries - one for post info and one for city/state info or querying across a join?).
The is really only one definitive way to determine what is fastest approach. That is to test both ways in your environment.

1) Fully normalized table(when it have only integer values and other tables have only one int+varchar) have advantage when you not dooing full table joins often and dooing alot of search on normalized fields. As downside it require large join/sort buffers and result more complex queries=much less chance query will be auto-optimized by mysql. So you have optimize your queries yourself.
2)Select distinct will be faster in almost any cases. Only case when it will be slower - you have low size sort buffer in /etc/my.conf and much more size memory buffer for php.
Distinct select can use indexes, while your code can't.
Also sending large amount of data to your app require alot of mysql cpu time and real time.

Optimal method for retrieving two levels of hierarchical data from MySQL

There seems to be no shortage of hierarchical data questions in MySQL on SO, however it seems they are mostly talking about managing such data in the database or actually retrieving recursively hierarchical data. My situation is neither. I have a grid of items I need to display. Each item can also have 0 or more comments associated with it. Right now, both the item, along with its data, are displayed in the grid as well as any comments belonging to that item. Usually there is some sort of drill down, dialog, or other user action required to see child data for a grid item but in this case we display both parent and child data in the same grid. Might not fit the de facto standards but it is what it is.
Right now the comments are retrieved by a separate MySQL query for every single parent item in the grid. I immediately cringe at this being aware of all the completely separate database queries that have to be run for a single page load. I haven't profiled but I wouldn't be too surprised if this is part of the slow page loads we sometimes see. I'd like to ideally bring this down to a single query or perhaps 2. However, I'm having difficulty coming up with a solution that sounds any better than what is currently being done.
My first thought was to flatten the comment children for each row with some sort of separator like '|' and then explode them back apart in PHP when rendering the page. The issue with this is it gets increasingly complicated with having to separate each field in a comment, and then each comment, and then account for the possibility of separator characters in the data. Just feels like a mess to maintain and debug.
My next thought was to left outer join the comments to the items and just account for the item duplicates in PHP. I'm working with Codeigniter's database library that returns a PHP array for database data. This sounds like potentially a lot of duplicated data in the resulting array which could possibly be system taxing for larger result sets. I'm thinking in most cases it wouldn't be too bad though so this option is currently at the top of my possibilities list. Ideally, if I understand MVC correctly, I should keep my database, business logic, and view/display as separate as possible. So again, ideally, there should not be any database "quirks" (for lack of a better word) apparent in the data returned by the model. That is, whatever calls for data from this model method, shouldn't be concerned with duplicate data like this. So I'd have to add on an additional loop to somehow eliminate the duplicate item array entries but only after I have retrieved all the child comments and placed them into their own array.
Two queries is another idea but then I have to pass numerous item IDs in the SQL statement for the comments and then go through and zip all the data together manually in PHP.
My goal isn't to get out of doing work here but I am hoping there is some more optimal (less resource intensive and less confusing to the coder) method I haven't thought of yet.

As you state in your question, using a join will bring back a lot of duplicate information. It should be simple enough to remove in PHP, but why bring it back in the first place?
Compiling a SQL statement with a list of IDs retrieved from the query for your list of items shouldn't be a problem (see cwallenpoole's answer). Alternatively, you could create a sub-query so that MySQL recreates the list of IDs for you - it depends on how intensive the sub-query is.
Select your items:
SELECT * FROM item WHERE description = 'Item 1';
Then select the comments for those items:
SELECT * FROM comment WHERE item_id IN (
SELECT id FROM item WHERE description = 'Item 1'
);

For the most part, I solve this type of problem using some sort of ORM Lazy-Loading system but it does not look like you've that as an option.
Have you considered:
Select all top-level items.
Select all second-level items by the ID's in the top-level set.
Associate the objects retrieved in 2 with the items found in 1 in PHP.
Basically (in pseudo-code)
$stmt = $pdo->query("SELECT ID /*columns*/ FROM ENTRIES");
$entries = array();
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$row['child-entities'] = array();
$entries[$row['id']] = $row;
}
$ids = implode(',',array_keys($entries));
$stmt = $pdo->query("SELECT PARENT_ID /*columns*/ FROM children WHERE PARENT_ID IN ($ids)");
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$entries[$row['parent_pid']]['child-entities'][] = $row;
}
$entries will now be an associative array with parent items directly associated with child items. Unless recursion is needed, that should be everything in two queries.

SQL query to collect entries from different tables - need an alternate to UNION

I'm running a sql query to get basic details from a number of tables. Sorted by the last update date field. Its terribly tricky and I'm thinking if there is an alternate to using the UNION clause instead...I'm working in PHP MYSQL.
Actually I have a few tables containing news, articles, photos, events etc and need to collect all of them in one query to show a simple - whats newly added on the website kind of thing.

Maybe do it in PHP rather than MySQL - if you want the latest n items, then fetch the latest n of each of your news items, articles, photos and events, and sort in PHP (you'll need the last n of each obviously, and you'll then trim the dataset in PHP). This is probably easier than combining those with UNION given they're likely to have lots of data items which are different.
I'm not aware of an alternative to UNION that does what you want, and hopefully those fetches won't be too expensive. It would definitely be wise to profile this though.

If you use Join in your query you can select datas from differents tables who are related with foreign keys.

You can look of this from another angle: do you need absolutely updated information? (the moment someone enters new information it should appear)
If not, you can have a table holding the results of the query in the format you need (serving as cache), and update this table every 5 minutes or so. Then your query problem becomes trivial, as you can have the updates run as several updates in the background.

Suggestions on retrieving related data across databases with different logins?

I have an array of user ids in a query from Database A, Table A (AA).
I have the main user database in Database B, Table A (BA).
For each user id returned in my result array from AA, I want to retrieve the first and last name of that user id from BA.
Different user accounts control each database. Unfortunately each login cannot have permissions to each database.
Question: How can I retrieve the firsts and lasts with the least amount of queries and / or processing time? With 20 users in the array? With 20,000 users in the array? Any order of magnitude higher, if applicable?
Using php 5 / mysql 5.

As long as the databases are on the same server just use a cross database join. The DB login being used to access the data will also need permissions on both databases. Something like:
SELECT AA.userID, BA.first, BA.last
FROM databasename.schema.table AA
INNER JOIN databasename.schema.table BA ON AA.userID = BA.userID
In response to comments:
I don't believe I read the part about multiple logins correctly, sorry. You cannot use two different mySQL logins on one connection. If you need to do multiple queries you really only have three options. A) Loop through the first result set and run multiple queries. B) Run a query which uses a WHERE clause with userID IN (#firstResultSet) and pass in the first result set. C) Select everything out of the second DB and join them in code.
All three of those options are not very good, so I would ask, why can't you change user permissions on one of the two DBs? I would also ask, why would you need to select the names and IDs of 20,000 users? Unless this is some type of data dump, I would be looking for a different way to display the data which would be both easier to use and less query intensive.
All that said, whichever option you choose will be based on a variety of different circumstances. With a low number of records, under 1,000, I would use option B. With a higher number of records, I would probably use options C and try to place the two result sets into something that can be joined (such as using array_combine).

I think they key here is that it should be possible in two database calls.
Your first one to get the id's from database A and the second one to pass them to database B.
I don't know mysql, but in sqlserver I'd use the xml datatype and pass all of the ids into a statement using that. Before the xml datatype I'd have built up some dynamic SQL with the id's in an IN statement.
SELECT UserId FROM DatabaseA.TableA
Loop through id's and build up a comma separated string.
"SELECT FirstName, Surname FROM DataBaseB.TableA WHERE UserId IN(" + stringId + ")"
The problem with this is that wth 20,000 id's you may have some performance issues with the amount of data you are sending. This is where'd I'd use the XML datatype, so maybe look at what alternatives mysql has for passing lists of ids.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.