handling a lot of data with mysql and php in search

handling a lot of data with mysql and php in search - php

I'm making a car part system, to store all the parts inside mysql and then search for them.
Part adding goes like this:
you select up to 280 parts and add all the car info, then all the parts are serialized and put into mysql along with all the car info in a single row.
(for this example I'll say that my current database has 1000 cars and all of those cars have 280 parts selected)
The problem is that when I have 1000 cars with each of them having 280 parts, php and mysql starts getting slow and takes a lot of time to load the data, because the number of parts is 1000*280=280 000.
I use foreach on all of the cars and then put each part into another array.
The final array has 280 000 items and then I filter it by the selected parts in the search, so out of 28 000 parts it may have only have to print like 12 500 parts (if someone is searching for 50 different parts at the same time and 250 cars have that part).
Example database: http://pastebin.com/aXrpgeBP
$q=mysql_query("SELECT `id`,`brand`,`model`,`specification`,`year`,`fueltype`,`capacity`,`parts`,`parts_num` FROM `warehouse`");
while($r=mysql_fetch_assoc($q)){
$partai=unserialize($r['parts']);
unset($r['parts']); //unsetting unserialized parts so the whole car parts won't be passed into the final parts-only array
foreach($partai as $part){
$r['part']=$parttree[$part]; //$parttree is an array with all the part names and $part is the part id - so this returns the part name by it's id.
$r['part_id']=$part; // saves the part id for later filtering selected by the search
$final[]=$r;
}
}
$selectedparts=explode('|', substr($_GET['selected'], 0,strlen($_GET['selected'])-1)); //exploding selected part ids from data sent by jquery into an array
foreach($final as $f){
if(in_array($f['part_id'], $selectedparts)){
$show[]=$f; //filtering only the parts that need to be shown
}
}
echo json_encode($show);
This is the code I use to all the cars parts into arrays and the send it as json to the browser.
I'm not working on the pagination at the moment, but I'll be adding it later to show only 10 parts.
Could solution be to index all the parts into a different table once 24h(because new parts will be added daily) and then just stressing mysql more than php? Because php is doing all the hard work now.
Or using something like memcached to store the final unfiltered array once 24h and then just filter the parts that need to be shown with php?
These are the options I considered, but I know there must be a better way to solve this.

Yes, you should definitely put more emphasis on MySQL. Don't serialize the parts for each car into a single row of a single column. That's terribly inefficient.
Instead, make yourself a parts table, with columns for the various data items that describe each part.
part_id an autoincrement item.
car_id which car is this a part of
partnumber the part's external part number (barcode number?)
etc
Then, use JOIN operations.
Also, why don't you use a WHERE clause in your SELECT statement, to retrieve just the car you want?
Edit
If you're looking for a part, you definitely want a separate parts table. Then you can do a SQL search something like this.
SELECT w.id, w.model, w.specification, w.year, w.fueltype,
p.partnumber
FROM warehouse w
JOIN parts p ON (w.id = p.car_id)
WHERE p.partnumber = 'whatever-part-number-you-want'
This will take milliseconds, even if you have 100K cars in your system, if you index it right.

Your query should be something like:
<?php
$selectedparts=explode('|', substr($_GET['selected'], 0,strlen($_GET['selected'])-1)); //exploding selected part ids from data sent by jquery into an array
$where = ' id < 0 ';
foreach ($selectedparts AS $a){
$where .= " OR `parts` like '%".$a."%'";
}
$query = "SELECT * FROM `warehouse` WHERE ".$where." ORDER BY `id` ASC";//this is your query
//.... rest of your code
?>

Yes, look into has many relationships a car has many parts.
http://net.tutsplus.com/tutorials/databases/sql-for-beginners-part-3-database-relationships/
Then you can use an inner join to get the specified parts. You can do a where clause to match the specific partIds to filter out unwanted parts or cars.

Related

An alternative way to loop through large dataset

I have a table that contains a user_id, and an items field. The user_id is just an int with the user's id, and the items is an xml structured object in a 'text' field. I want to be able to see statistics about the player items. i.e. who has the most of some item, the average wealth of everyone, etc.
I currently have to loop through each row and then again create a SimpleXMLElement and loop thru that and filter given a specific criteria.
The structure is like this:
inventory
if I want to do a query to count all of the items with item id 332 for example, this query takes like 3-4 seconds. We expect there to be 50k+ rows(currently 28k), so if there is any other way I can speed this process up, it would be great.

what about using mysql like ?
for example
SELECT * FROM table WHERE inventory like '%<itemid>332</itemid>%';

Depending on how much you need to query this data, storing it as XML might not be the best approach; assuming that you've already decided that it is, many databases support some form of XPath queries which can be used to extract data out of XML fields. MySQL provides some support in the form of the ExtractValue function, which can be used to extract the criteria that you need in a more reliable way than simply using LIKE (e.g. in deefactorial's answer; what if there was more than one itemid in your XML?).
An example can be seen here on SO, in How to use XPATH in MySQL select?.

unused number mysql

How can i get all of the records in a table that are out of
sequence so I know which account numbers I can reuse. I have a range
of account numbers from 50100 to 70100. I need to know which account
numbers are not stored in the table (not currently used) so I can use.
For instance say I have the following data in table:
Account Name
------ --------
50100 Test1
50105 Test2
50106 Test4
..
..
..
I should see the results:
50101
50102
50103
50104
because 50101-50104 are available account numbers since not currently in
table.
copied from http://bytes.com/topic/sql-server/answers/78426-get-all-unused-numbers-range
With respect to MYSQL and PHP.
EDITED
My range is 10000000-99999999.
My present way is using MySql query:
'SELECT FLOOR(10000000 + RAND() * 89999999) AS random_number FROM contacts WHERE "random_number" NOT IN (SELECT uid FROM contacts) LIMIT 1';
Thanks.

solution 1:
Generate a table with all possible accountnumbers in it. Then run a query similar to this:
SELECT id FROM allIDs WHERE id NOT IN (SELECT id FROM accounts)
Solution 2:
Get the whole id colummn into an array in php or java orso. Then run a for-loop to check if the number is in the array.
$ids = (array with all ids form the table)
for($i=50100;$i<=70100;$i++){
if(array_search($i, $ids) != -1){
$availableids[] = $i;
}
}

one way would be to create another table - fill it will all allowable numbers, then write a simple query to find the ones in the new table that are not in the original table.

Sort the accounts in the server, and find jumps in PHP while reading in the results. Any jump in the sorted sequence is "free for use", because they are ordered. You can sort with something like SELECT AccountNumber FROM Accounts SORT ASCENDING;.
To improve efficiency, store the free account numbers in another table, and use numbers from this second table until no more remain. This avoids making too many full reads (as in the first paragraph), which may be expensive. While you are at it, you may want to add a hook in the part of the code which deletes accounts, so they are immediately included in this second table, making the first step unnecessary.

PHP/PDO strange request

I want to do a PDO request that will get informations from 2 different tables, but things fastly get hard, i explain :
I have a first database table (flowers) that is organized like this (but it's REALLY bigger than what I show to you) :
ID; name; price of format 1; price of format 2; name of other format 3; price of other format 3;
FLOW0001; big yellow flower; ; 15,99; more big format; 34,99;
FLOW0002; little red flower; 5,99; 8,99; ; ;
... it goes on like this...
The second table (trees) is like this :
ID; name; name of format; price of format;
TREE0001; OMG BIG TREE !; F*** big format; 599,99;
TREE0001; OMG BIG TREE !; F*** even bigger format; 899,99;
TREE0002; litte ugly tree; little format; 20,99;
... it goes on like this...
The thing is that I want to "merge" these 2 tables and to show them togheter blended in a page like this :
while ($datas =$response->fetch()) //fetching the tables, they will be togheter and ordered by (their biggest) price. So, trees and flowers will blend.
{
// then echo the fetched datas. I will need to separate the prices and show them... some if/else based on pregmatches of the ID to know if it's an TREE or FLOW.
}
How to blend this ? Igot no Ideas. The worst is that I will need to do a page system (50 per pages). On the page system and in the pages, all the duplicated trees will need to appear as only uniques trees... HELP !

You can use a union to get the data from both tables. just be sure that you are returning matching columns in both queries. So if in your first query you have the following data types int, text, varchar, decimal, decimal you will need to make sure that the second query gets those same column types otherwise the DB will have a hissyfit about it. You can use an order by clause on the end to sort the data from both queries nicely.
Something like this should do the trick:
select
ID,
name,
price
from
flowers
where
// your clauses
union
select
ID,
name,
price
from
trees
where
// your clauses
order by 3 desc
Edit: You can group_concat() fields in them if you want, and yes, you will be able to do an explode() on them in your PHP. The default group_concat delimiter is a , (comma) so if you have commas in your fields, you will need to change it to something so that your explode works nicely.

Optimal method for retrieving two levels of hierarchical data from MySQL

There seems to be no shortage of hierarchical data questions in MySQL on SO, however it seems they are mostly talking about managing such data in the database or actually retrieving recursively hierarchical data. My situation is neither. I have a grid of items I need to display. Each item can also have 0 or more comments associated with it. Right now, both the item, along with its data, are displayed in the grid as well as any comments belonging to that item. Usually there is some sort of drill down, dialog, or other user action required to see child data for a grid item but in this case we display both parent and child data in the same grid. Might not fit the de facto standards but it is what it is.
Right now the comments are retrieved by a separate MySQL query for every single parent item in the grid. I immediately cringe at this being aware of all the completely separate database queries that have to be run for a single page load. I haven't profiled but I wouldn't be too surprised if this is part of the slow page loads we sometimes see. I'd like to ideally bring this down to a single query or perhaps 2. However, I'm having difficulty coming up with a solution that sounds any better than what is currently being done.
My first thought was to flatten the comment children for each row with some sort of separator like '|' and then explode them back apart in PHP when rendering the page. The issue with this is it gets increasingly complicated with having to separate each field in a comment, and then each comment, and then account for the possibility of separator characters in the data. Just feels like a mess to maintain and debug.
My next thought was to left outer join the comments to the items and just account for the item duplicates in PHP. I'm working with Codeigniter's database library that returns a PHP array for database data. This sounds like potentially a lot of duplicated data in the resulting array which could possibly be system taxing for larger result sets. I'm thinking in most cases it wouldn't be too bad though so this option is currently at the top of my possibilities list. Ideally, if I understand MVC correctly, I should keep my database, business logic, and view/display as separate as possible. So again, ideally, there should not be any database "quirks" (for lack of a better word) apparent in the data returned by the model. That is, whatever calls for data from this model method, shouldn't be concerned with duplicate data like this. So I'd have to add on an additional loop to somehow eliminate the duplicate item array entries but only after I have retrieved all the child comments and placed them into their own array.
Two queries is another idea but then I have to pass numerous item IDs in the SQL statement for the comments and then go through and zip all the data together manually in PHP.
My goal isn't to get out of doing work here but I am hoping there is some more optimal (less resource intensive and less confusing to the coder) method I haven't thought of yet.

As you state in your question, using a join will bring back a lot of duplicate information. It should be simple enough to remove in PHP, but why bring it back in the first place?
Compiling a SQL statement with a list of IDs retrieved from the query for your list of items shouldn't be a problem (see cwallenpoole's answer). Alternatively, you could create a sub-query so that MySQL recreates the list of IDs for you - it depends on how intensive the sub-query is.
Select your items:
SELECT * FROM item WHERE description = 'Item 1';
Then select the comments for those items:
SELECT * FROM comment WHERE item_id IN (
SELECT id FROM item WHERE description = 'Item 1'
);

For the most part, I solve this type of problem using some sort of ORM Lazy-Loading system but it does not look like you've that as an option.
Have you considered:
Select all top-level items.
Select all second-level items by the ID's in the top-level set.
Associate the objects retrieved in 2 with the items found in 1 in PHP.
Basically (in pseudo-code)
$stmt = $pdo->query("SELECT ID /*columns*/ FROM ENTRIES");
$entries = array();
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$row['child-entities'] = array();
$entries[$row['id']] = $row;
}
$ids = implode(',',array_keys($entries));
$stmt = $pdo->query("SELECT PARENT_ID /*columns*/ FROM children WHERE PARENT_ID IN ($ids)");
foreach( $row as $stmt->fetchAll(PDO::FETCH_ASSOC) )
{
$entries[$row['parent_pid']]['child-entities'][] = $row;
}
$entries will now be an associative array with parent items directly associated with child items. Unless recursion is needed, that should be everything in two queries.

PHP/DB pattern question

I have an app that works with an idea of "redemption codes" (schema: ID, NAME, USES, CODE). And example would be "32, Stack Overflow, 75, 75%67-15hyh"
So this code is given to the SO community, let's say, and it has 75 redemptions. You redeem it by entering some shipping info and the code. When entered, this check is preformed:
if (code exists){
if (count_entries_where_code=$code < $uses_set_at_creation){
//enter information into DB for processing
{
//echo "sorry, not a real code"
}
So the number of total uses is hardcoded, but the current # of redemptions is generated with a SQL query (count_results from entry_data WHERE code=$code). This part works fine, but here is the question:
At the view page where I manage the codes, I have the basic setup (in pseudo PHP, with the real code separated into an MVC setup):
$results = "SELECT * FROM codes";
foreach ($result as $code){
echo $code->code;
echo $code->name;
//etc. It's actually all in a nice HTML table.
}
So I want to have a column listing "# of uses remaining on code". Should something like this be stored in the DB, and drawn out that way? It would be easier to generate with the foreach loop, but I don't usually prefer to store "generated" statistics like that. Is there a clever way to get those results onto the correct rows of the table created with the foreach loop?
(I'm fine with code so I don't need a working/great syntax example, just an explanation of a pattern that might fit this problem, and maybe a discussion of a common design for something like this. Am I right to avoid storing generate-able data like # of uses left? etc.)

Am I right to avoid storing generate-able data like # of uses left?
Yes, you are correct to not store computed values.
Computation logic can change, and working with a stored computed value to reverse engineer it can be a nightmare - if it is possible at all in some cases.
It sounds like you want to combine the two queries:
SELECT c.id,
c.name,
c.uses,
c.code,
x.num_used
FROM CODES c
JOIN (SELECT ed.code,
COUNT(*) 'num_used'
FROM ENTRY_DATA ed
GROUP BY ed.code) x ON x.code = c.code

When you run your query to get the codes for the page add a subquery to get the number of used codes from the entry_data table.
select codes.id, codes.name, codes.uses, codes.code (select count(code) from entry_data where entry_data.code=codes.code ) as used_codes
Id use code_id as a foreign key and not code.
This is all assuming i'm reading your problem correctly

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.