PHP - parsing big data and keeping them up to date

PHP - parsing big data and keeping them up to date - php

I use PHP for solving this task. But that's not the point.
Every morning I get four files each of them contains nearly 6000-8000 records having the following form:
Product name
Package
Producer
Price with tax
Expire date
Rest
Series
Parsing these records I get the table of products. Later clients make orders so I need to keep id of item in the order table. (clients would like to see history of purchases)
All is fine. The arising problem is that one day any of suppliers can send completely different price list. I.e. some products will be removed and others will be added. So it would be completely wrong to rely upon they order in the price list.
What I've come to is to parse catalog blindly adding all items one time. Every next time when I get catalog I need to add only new items and remove the old one from DB. (though not actually removing but just marking as deleted so that no new purchases would be possible )
For deciding whether item is new or not I will retrieve record by record from Excel file and check "Product name", "package", "producer", "series" fields in conjunction against the table with products.
If no such item is found I assume that this is a new item and I'll add it to DB.
What to do with deleted items? I'm not warned about deletion of them. So I can't find out what items are missing in the new Excel file. The solution is to scan DB item by item and see whether all items in DB are present in Excel file. If some item is missing then I will mark it as deleted.
Once deleted items can be back for selling. So I will need to select all deleted items and check one by one against Excel file. If item appears in Excel file - I add it back.
It's worth to note that some suppliers give their catalogs as Excel files and others as DBF files as for now. Who knows what formats will come in the future. Also the number of suppliers is supposed to grow up (next month we get 2 more into play).
My question. Is there any better way to do it more efficient? I'm afraid that my method is too straightforward.
Having 8000 records and doing 3 checks I will get O^2 complexity for every price list making whole search through MySQL Db. Perhaps it will work for 8000 records but I'm sure it's going to fail when one day I get price list with let's say 10^5 record.
Is there a better way to organise it?
Thanks.

Related

Laravel - DB structure change - Is this a good practice or not?

Hello how are you? I wanted to ask you who have more experience than me if what I want to do is a good or bad practice in Laravel.
I am making an ordering app and in my database I have that a product has a certain price (one more field in the product table). But then I realized that when the price of the product one day changes, the price will also change in an old order, that is, the old order will adopt the new price of the product at this moment and not on the date when the order was placed, thus generating an information problem.
For that, I decided to create a new table that has the temporary prices, that is, if the price changes, that a new row is created in the database with the price on a certain date, but that the price it had in another is recorded. date.
Now, my question is ... How can I bring the current price of a product to that new table that I am creating, without the user intervening ... that is, migrate all the prices of each product to that new table.
My idea was to create a function that goes through each product and that creates a new row in the prices table with the product id, the price and the current date. Once this function is created, run it through Tinker and that the user does not notice absolutely nothing, but that the system adopts the new price structure.
This is good?
Is it a good practice or is there a better way to do it?
Thank you.

welcome to SO! This is one possible way to do it to have a table with the price history incl. start and end date of validity. To not overcomplicate the queries for showing the user his order history I would recommend you also to save the price of the product in the order. For example in the many-to-many mapping table between order and product, where you can also save the amount of the purchased products.
You could make a CLI command (e.g. php artisan make:command InsertOrderPrices) then to fill these fields for the old orders after you created them.

Database logic for online shop orders magament

I am designing as a project a web store (using PHP Laravel and MySQL FYI) and I am at the part where I have to create the logic behind the production system, which goes like this:
-On my Database,
I have 1 ORDER table where I have all the information regarding the shipping, customer, etc.
I have another table called ITEM where are listed all the Items in an order (so if an order has 3 items, there will be 3 lines in the ITEM table with a Foreign Key pointing to the ORDER).
Now I'm creating the PRODUCTION DASHBOARD. Right now I'm able to scan the item ID and get the shipment information on the Dashboard.
After that, for orders with multiple items what I want to do is for the system to tell the user to deposit the item in a numbered box to wait for the rest of the items from the order. That way the user can keep scanning items from other orders and once another item from the ordered stored in X box is produced, he can scan it and the system will then tell him that the other items from the same Order and placed on X box and he can do that until the order is complete.
My question is what would be the best way and logic Database wise (and also Laravel wise if you want to further expand your answer hehe) to implement this BOX system.
I hope my question is clear enough and thank you very much :)

I had a similar system for a project that I was working on. What I did was, was create a database table called temp_orders with a column called items that each item was separated by a line break. Until the order was finalized (100% processed), the order would remain in temp_orders.
Once finalized, it would get deleted from temp_orders and moved over to the orders table. If I needed to check items, I would explode() the data from the items column in temp_orders table using a line break, thus putting them into an array and then using the data however I needed to.
You need to determine when you want to finalize the order. It could be upon credit card payment, or upon user order confirmation, for example.

How to sync external product feed with internal product database in PHP/SQL?

I'm in the process of creating an application where we are fed several external product feeds daily, and we populate our products database with the feeds.
However the trick is we need to keep the product db in sync with the latest feed(s).
Previously I had toyed with the theory of populating the current product list from db in an array, and doing array comparison with the latest feed, that got gunned down once the product count was in the thousands. (Ran out of memory when trying to get a 5000 records).
So after abit of research, it seems the solution would probably lie on the SQL side, using TRIGGERS perhaps. Though I'm not quite sure how to go about it, hence my question.
So the 2 objectives I need to accomplish with the syncing process:
1) Insert new products that do not already exist in our db. We can accomplish this with the INSERT IGNORE method.
2) Find products on our DB that do NOT exist on the latest feed, and do something to them. (flag as deleted, or move to a deleted products table, etc.)
Step 2 is where I'm having trouble. I'm thinking now maybe for every sync operation, we insert the products from the latest feed into a 'Temp-Products' table, and somehow compare 'Products Table' with 'Temp-Products' table in finding the records that need to be flagged as deleted.
Any advice please?
Thanks

Obviously over-thought this one. The solution as suspected and further enforced by Anigel is to create a temporary table, 'products_temp' to store new feeds. We then run a simple join to find out what products are in the Products table, but not in 'products_temp', hence suggesting that the products have been sold out or deleted on the retailer.
We can then either flag the results of the query as deleted/sold out/do whatever.
The query I used is this:
SELECT products.sku_number, products_temp.sku_number FROM products LEFT OUTER JOIN products_temp ON products.`sku_number` = products_temp.`sku_number` WHERE products_temp.sku_number IS null

Magento attribute values saved in admin disappear

I had set up the catalog to sort on these values, which I saved through the admin last night. For most of a full day they were there. Now, like ghosts they have disappeared.
I suspect the problem could be tied to values saved through the admin rather than imported.
Can anyone point me to the source of this problem and the solution in the code?
Between the last time I saw the values working on the frontend (a few hours ago) and now, I did several things:
1.) Added 2 new attributes and some test values (unrelated to problem attribute), reindexed.
2.) Tried to import unrelated values for 60,000 products. Import hanged, so
3.) Imported values for 20,000 products at a time, no errors.
4.) reindexed.
Now all values saved manually are gone. Again, how can this happen? If they were saved to the DB, then shouldn't it take an delete call from the code to the DB to do this? How/why would such a call be made when I did not execute any such command in code or through admin? How can I fix this and avaoid in future?
TO AXEL (clarifying):
Thanks for your reply, #Axel.
1.) I created a text attribute called "sort_order" and entered some integer values through the admin.
2.) Then I did a full db backup with mysqldump.
3.) Then, I created two new attributes, "random_order" (price type) and "random_order_1" (text type). The purpose was to experiment with two solutions for shuffling the products in the catalog pages.
4.) Through phpMyAdmin I did a simple query to give me all products in random order:
SELECT `sku`
FROM `catalog_product_entity`
WHERE 1
ORDER BY RAND( )
and exported the result to csv. I simply used excel to number the items from 1-60,000, creating an import csv file with columns: sku, random_order (price type), random_order_1 (text type), with both attributes having same integer values.
5.) I used standard import method (replace existing complex data) in admin, 20k products at a time. After import, values for previously set and seemingly totally unrelated "random_order" have been deleted.
Before reindexing, every item's sort_order is now reset to default (=1), but it still appears in proper order on the frontend (so value still exists in product flat table), while the random_order and random_order_1 attributes have their imported values.
After reindex, all trace of sort_order is wiped out. That would make sense if I was actually importing that attribute, but I'm not. No other attribute appears to have been affected.
I restored db from mysqldump, tried whole process again. Same result.

Take snapshot of DB table row?

I want to take a snap shot of a row in a MySQL table.
The reason being, if someone buys a product. I want to take a snapshot of that product to store for the order.
It needs to be a snapshot to maintain data integrity. If I just assign the product to the order, if the product changes in the future the order will show those changes. For example if the price changes, the order will now load the new data and say that it sold the product at its new price rather than what the price was when the order was placed. So a snapshot needs to be assigned to the order instead.
The way I did this in the past was having 2 tables, one for products, and one for snapshots of products. The snapshot had every column as the regular table plus extra colums like order_id
I had a script to take a snapshot that automatically looked at the fields in the regular table and tried to do an insert into the same fields into the snapshot table.
The biggest problem with that approach is that, if I added a column to the regular table and forgot to add the same column to the snapshot table; the script would try to insert data into a non existent field and fail.
I also disliked the idea of having 2 tables that were nearly identical. I think maybe figuring out a way to use one table for both purposes might be better.
So I am wondering if there is a known method I am unaware of to solve this issue?
My previous project used no framework but my next one will be using CakePHP if that matters.

I think the best way of handling this would be to roll the "snapshot" information into an orders_products table. So if you have an order, store the total price, tax, etc. information in a single row on the orders table and reference that order_id on your orders_products table. On your orders_products table, you can have order_id, product_id, price, quantity, discount and whatever else you need.

Seems like your previous is fine. But that you just need to do more testing to ensure that you don't forget to add the new fields to the snapshot table. Seems like a basic test that would be easy to do. The other alternative is to just use a big text field, and store the snapshot as XML. This will let you store the snapshot regardless of if the schema changes. Depending on how much you want to query this data, it may or may not work for you.
Also you may not want to store every field, as it may just take up extra space. For instance, if you have the location of the image file of the item, you may not want to store that, as it may not be important at a later date. You could try querying information_schema to query which fields are in the snapshot table, and only copy the available fields.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.