array comparison vs SQL join - php

I have two, big, 2 dimensional arrays (pulled from some xml data) one (A list) is ~1000 items containing 5 fields the other (B list) is dinamically between 10.000-12.000 items containing 5 fields.
My idea was to compare EACH id key of list A against EACH id key of list B and on "true" compose a new array of combined fields, or just fields from array A if no match.
I used nested foreach loops and ended up with millions of iterations taking long time to process. needless to say...not a solution.
The form of this two structures and my needed result reminded me straight away of a sql join.
The questions are:
1.) Should i try sql or nested foreach might not be the best php way?
2.) Will a relational query be much faster than the iterations?
EDIT:
I pull data only periodically from an xml file (in a separate process) which contains 10+ fields for each node. Than i store the 5 fields i need in a CSV file to later compare with table A that i pull out from a mysql database. basically much like catalog update of attributes with fresh feed.
I'm affraid the original idea of storing into CSV was an error and i should just save the feed updates into a database too.
EDIT 2
The array list B look like this
Array
(
[0] => Array
(
[code] => HTS541010A9E680
[name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
[price] => 385.21
[avail] => 0
[retail] => asbis
)
...
...
while the A list is similar in all but the 'code' field which is the only one useful for comparison
Array
(
[0] => Array
(
[code] => ASD-HTS541010A
[name] => HDD Mobile HGST Travelstar 5K100 (2.5", 1TB, 8MB, SATA III-600)
[price] => 385.21
[avail] => 0
[retail] => asbis
)
As you can see each feed will have universal code BUT some different random data as prefix or suffix so in each loop i have to do a couple of operations on the string to stripos or compare it to feeds id for a match or close match.
Pseudo code:
$mylist = loadfromDB();
$whslist = loadfromCSV();
foreach ($mylist as $myl) {
foreach ($whslist as $whl){
if ((stripos(code_a,code_b) OR (code_b,code_a) !== false)){
...
}
elseif (stripos(substr(strstr(code_a,'-'),1),code_b) !== false) {
...
}
elseif (stripos( substr(code_a,0,-5);) == !false ){
...
}
}
}

Using SQL will be faster because most SQL engines are optimized for joins, and your method is a brute-force method. However, inserting all that data to MySQL tables is quite a heavy task, so it's still not the best solution.
I suggest you do the join in PHP - but use a smarter algorithm. Start by sorting the two arrays by the field you want to match. Iterate both sorted arrays together - use two iterators(or pointers or indices or whatever) - lets say a iterates over A and b over B. On each iteration of the loop, compare the comparison field of the elements pointed by the a and b. If a's is smaller - advance a. If b's is smaller - advance b. If a's is equal to b's - you have a match, which you should store in a new list, and then advance both a and b(assuming the relation is one-to-one - if it's one-to-many you only advance the many iterator, and if it's many-to-many you need a bit more complex solution).

Related

save formula in database and substitue value at runtime php

i want to store the following formula in database
Z-Score = 1.2(Working_Capital/Total_Assets) + 1.4(Market_Value/Total_Assets)
Working_Capital,Total_Assets,Retained_Earnings are stored in database against different tables
eg:- Total_Assets = BalanceSheet.totalAssets, Market_Value = expert.market_value
this formula can be changed by the user through a form, he can add more quantities to the formula
Z-Score = 1.2(Working_Capital/Total_Assets) + 1.4(Market_Value/Total_Assets)+1.0(Sales/Total Assets)
PROBLEM:-
now keeping in mind that the formula can change anytime i need to store this in the database accross the user id so that i can give it an edit option.
also the formula is used to calculate data over period of years.
many users will have many formula.
APPROACH:-
from what i read so far many suggest to store it as a string, it is not possible since the values i need to display the user for editing the formula is different than what it is in database.
some suggest to convert the expression to tree data structure but i don't know which tree structure will suit my application???
lastly i need to store it in mysql database so will tree storing and retrieving be faster???
i am asking for suggestions as i feel its a complex problem but if anyone have already solved it PHP then please feel free to share
thanks in advance.
Solution to the above approach that i came up with.
1) take the formula from the user and when he is selecting the formula operands create a new array of operands which maps the names of operands in frontend with they corresponding values of database table and colunm name
eg:-
`$input['operands'] = array(
array('Total_Assets' => 'balance_sheet.TotalAssets'),
array('Working_Capital' => 'balance_sheet.WorkingCapital'),
array('Revenue' => 'balance_sheet.TotalRevenue'),
);`
2) create an infix form of the formula from which you can create a binary tree of the following form
infix form:-
(
[0] => revenue
[1] => total_assets
[2] => working_capital
[3] => +
[4] => /
)
Binay tree:-
{"__exp1":{"operator":"+","operand":["working_capital","total_assets"]},"root":{"operator":"\/","operand":["__exp1","revenue"]}}
3) now send all this to the server and there i save it as each column of table, also reverse mapping of the tree to infix form of the formula is possible on server side
4) this helps me to substitute the database values since i have the database column array from which i get the values and execute the formula and return the values.

select sql many-to-many relationship into nested php object

problem
I have two data tables SEQUENCES and ORGANISMS whose many-to-many-relationship is mappend in the table SOURCES. There is also a 1-m relationshipt between SOURCES and ENTRIES. I will append a detailed structure.
What i want to achieve, is the display of all sequences with all associated organisms and entries, where a condition within the sequences table is met. I have some ideas on how to achieve this, but i need the solution with the best performance, as each of these contains 50k+ entries.
idea one
Select all organisms that belong to the same sequence as a concatenated string in sql, and split it in PHP. I have no idea though, how to do the concatenation in SQL.
idea two
select same sequences with different organisms as distinct records, order by organism, and join them later in php. though this somehow feels just wrong.
idea three
use views. ANY idea on this one appreciated
structure
SEQUENCES
SEQUENCE_ID
DESCRIPTION
ORGANISMS
ORGANISM_ID
NAME
SOURCES
SOURCE_ID
SEQUENCE_ID FK to SEQUENCES.SEQUENCE_ID
ORGANISM_ID FK to ORGANISMS.ORGANISM_ID
ENTRIES
SOURCE_ID FK to SOURCES.SOURCE_ID
ENTRY_VALUE
desired outcome
array(
array(
"SEQUENCE_ID" => 4,
"DESCRIPTION" => "Some sequence",
"SOURCES" => array(
array(
"ORGANISM_ID" => 562,
"ORGANISM_NAME" => "Escherichia coli",
"ENTRIES" => array(
"some entry",
"some other entry"
),
array(
"ORGANISM_ID" => 402764,
"ORGANISM_NAME" => "Aranicola sp. EP18",
"ENTRIES" => array()
)
)
),
array(
"SEQUENCE_ID" => 5,
.....
)
)
PHP5 and FIREBIRD2.5.1
You can't fetch a nested array like that directly from a flat table structure. But if I get you right, what you want to do is not that hard to achieve.
I don't understand why you would concatenate things and then split them again, that's hard to maintain and probably slow.
I see two approaches here:
Fetch everything at once as flat table using JOIN and loop through it in PHP. This approach creates a lot of duplication but it's fast because you can fetch all data in one query and then process it with PHP.
Fetch every entity separately, loop and fetch the next hierarchy level as you go. This approach will be slower. It takes complexity away from the SQL query and doesn't fetch redunant data. It also gives you more freedom as to how you loop through your data and what you do with it.
Alternatively you might want to actually store hierarchical data in a no-sql way, where you could already store the array structure you mentioned.

mongodb: updating elements?

I am (as most ) coming from a mySQL background trying to switch over to noSQL and mongoDB. Since denormalization is a part of noSQL since joins is impossible, here's how I would design a simple blog:
array (
blog_title => 'my blogpost',
'date' => '2010-09-05',
comments => array (
'1' => 'Such a good post!!! You deserve a nobel prize'
)
);
If I want to update the comments, adding a new element in that array, how can I make sure that this is done and not the whole comments array being overwritten if multiple users are trying to write a comment at the same time?
Is it the push function I am looking after in mongoDB?
Correct, the $push operator allows you to update an existing array. You can use the $pushAll operator to add multiple values in a single query.
To add a comment to your example document, the query would be:
db.posts.update({blog_title: "my blogpost"}, {$push: {comments: "New comment"}})
These operators are atomic, so you won't run into any problems if multiple users add comments simultaneously.

fastest way to get parent array key in multidimensional arrays with php

what is the best way to get parent array key with multidimensional arrays?
for example I have this array:
array(
[0] => array(0=> sample, 1=>picture, 2=>frame, 3=>google)
[1] => array(0=> iphone, 1=>orange, 2=>love, 3=>msn)
[2] => array(0=> joe, 1=>geee, 2=>panda, 3=>yahoo)
)
now I need to search for example google and get the parent array key..
which it should be 0...any ideas? I used for loop for this but I think it will be slow if I have arrays with 700000 rows..
If you have an array with 700,000 rows you are almost certainly doing something wrong... I would first recomend thinking about utilizing a different data store: flat file or some type of DB.
foreach($array as $key => $value) {
if(in_array('google', $value)) return $key
}
Arrays with 700,000 rows? How many arrays? 9/10 times problem is that you've got your data set up wrongly.
I'm going to go ahead and assume you're doing a search of some sort. As you can't index an array (in the search meaning of index) then you're probably best putting the data into a database and making the most of column indexing to search fast.
Depending on context, you may alternatively want to think about storing your data in files, one per array, and using file searches to find which file contains your value.

PHP/MySQL OOP: Loading complex objects from SQL

So I'm working on a project for a realtor. I have the following objects/MySQL tables in my design:
Complexes
Units
Amenities
Pictures
Links
Documents
Events
Agents
These are the relationships between the above objects.
Complexes have a single Agent.
Complexes have multiple Units, Amenities, Pictures, Links, Documents, and Events.
Units have multiple Pictures, Links, and Documents.
Amenities, Pictures, Links, Documents, and Events all have the necessary foreign keys in the database to specify which unit/complex they belong to.
I need to load the necessary objects from the database into PHP so I can use them in my project.
If I try to select all the data out of the table in 1 query, using LEFT JOINS, I'll get AT LEAST (# of links) * (# of pictures) * (# of documents) rows for each unique unit. Add amenities, and events to that and I'll get all that * # of amenities * # of events for each complex...Not sure I want to try to deal with loading that into an object in PHP.
The other possibility is for each complex/unit, execute 1 separate SQL statement each for links, pictures, documents, events and amenities
My questions are as follows:
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
If not, how else can I get the data I need to load into a PHP object. Ideally, I would have an object as follows for units:
Unit Object
(
[id]
[mls_number]
[type]
[retail_price]
[investor_price]
[quantity]
[beds]
[baths]
[square_feet]
[description]
[featured]
[year_built]
[has_garage]
[stories]
[other_features]
[investor_notes]
[tour_link]
[complex] => Complex Object
(
[id]
[name]
[description]
etc.
)
[agent] => Agent Object
(
[id]
[first_name]
[last_name]
[email]
[phone]
[phone2]
etc.
)
[pictures] => Array
(
[1] => Picture Object
(
)
)
[links] => Array
(
[1] => Link Object
(
)
)
[documents] => Array
(
[1] => Document Object
(
)
)
)
I don't ALWAYS need ALL of this information, sometimes I only need the primary key of the complex, sometimes I only need the primary key of the agent, etc. But I figured the correct way to do this would be to load the entire object every time I instantiate it.
I've been doing a lot of research on OO PHP, but most (read all) online examples use only 1 table. That obviously doesn't help as the project I'm working on has many complex relationships. Any ideas? Am I totally off the mark here?
Thanks
[UPDATE]
On the other hand, usually on the front-end, which everyone will see, I WILL need ALL the information. For instance, when someone wants information on a specific complex, I need to display all units belonging to that complex, all pictures, document, links, events for the complex as well as all pictures, documents and links for the unit.
What I was hoping to avoid was, during one page load, executing one query to get the complex I need. Then another query to get the 20 units associated with the complex. Then for each of the 20 units, executing a query for picture, another for documents, another for links, etc. I wanted to get them all at once, with one trip through the database.
[EDIT 2]
Also, note that the queries to select the pictures, documents, links, events, and agent from the database are pretty simple. Just basic SELECT [list of columns] FROM [table] WHERE [primary_key] = [value] with the occasional INNER JOIN. I'm not doing any complex computations or subqueries, just basic stuff.
[BENCHMARK]
So after reading all the answers to my question, I decided to run a benchmark on what I decided to do. What I do is load all the units that I need. Then as I need to display pictures, document, blah blah, I load them at that time. I created 30,000 test units, each with 100 pictures, 100 documents, and 100 links. Then I loaded a certain number of units (I started with 1000, then 100, then the more realistic 10), looped through them, then loaded all pictures, documents and links associated to the unit. With 1000 units, it took approximately 30 seconds. With 100 units, it took about 3 seconds. With 10 units, it took about .5 seconds. There was a lot of variance with the results. Sometimes, with 10 units, it would take .12 seconds. Then it would take .8. Then maybe .5. Then .78. It was really all over the place. However, it seemed to average around half a second. In reality, though, I might only need 6 units at a time, and they each might only have 10 pictures, 5 links and 5 documents associated with them...so I think the "grab the data when you need it" approach is the best bet in a situation like this. If you needed to get all this data at once though, it would be worthwhile to come up with a single SQL statement to load all the data you need so you are only looping through the data one time (6700 units at a time took 217 seconds while the full 30,000 made PHP run out of memory).
If I properly index all my tables, is it REALLY a bad idea to execute 3-5 extra queries for each complex/unit?
In short, no. For each of the related tables, you should probably run a separate query. That's what most ORM (Object-Relational Mapping/Modelling) systems would do.
If performance is really a problem (and, based on what you've said, it won't be) then you might consider caching the results using something like APC, memcache or Xcache.
the point of ORM is not to load entire objects every time. the point is to make it easy and transparent for your app to access object.
that being said, if you need the unit object, then load the unit object, and only the unit object. if you need the agent object, then load that when you need it, not when you load the unit object.
Maybe you should think of breaking this up.
When you initiate your object, get only what details you need for that object to function. If and when you need more details, then go and get them. You distribute your load and processing this way: the object only gets the load and processing it needs to function, and when more is needed, it gets it then.
So, in your example - create the complex first. When you need to access a unit, then create that unit, when you need the agent, then get that agent, etc.
$complexDetails = array('id' => $id, etc);
$complexUnits = array();
.........
$complexUnits[] = new unit();
.........
$complexDetails['agent'] = new Agent();
I had to address this issue a while back when I concocted my own MVC framework as an experiment. To limit the layers of data loaded from the DB, I passed an integer to the constructor. Each constructor would decrement this integer before passing it to the constructors of the objects it instantiated. When it got to 0, no more sub-objects would be instantiated. This meant, basically, the int passed was the number of layers loaded.
So if I only wanted an attribute of the unit object, I'd do this:
$myUnit = new Unit($unitId,1);
If you want to "store" the objects, meaning cache them, just load them into a PHP array and serialize it. Then you can store it back to the database, in memcache or anywhere else. Attaching a label to it would allow you to retrieve it, and include a time stamp so you know how old it is (i.e. needs to be refreshed).
If the data doesn't change, or changes infrequently, there really is no reason to run multiple complex queries every time. Simple ones, like getting a primary, you might as well just hit the database directly.

Categories