I'm trying to wrap my head around a problem and would appreciate some advice.
I've built a PHP script that pulls a large amount of JSON data from a URL. It then runs through that data and inputs it in to a MySQL database.
I want to use a Cron job that pulls the JSON data every 2 or 3 hours, and if anything has changed compared to the data in the MySQL table, it updates it. If there are new records, it adds those.
The old system a friend was using would basically pull all of the data every 2/3 hours and overwrite the old data. This is fine for small amounts of data, but it seems super impractical to be writing 10,000-20,000 rows to a table every 2/3 hours.
Each JSON object has a unique identifier - so I was thinking of doing something like:
Pull MySQL table data in to an array;
Pull JSON data in to an array.
Use the unique identifier for each entry in the JSON data to search against the MySQL data. If entries are not the same, update MySQL table. If entry doesn't exist, insert a new row.
I'm looking for some tips on the best way / most efficient and fastest way to do this. I've been told I'm super bad at explaining things so let me know if I need to add any more detail.
Related
I'm looking to develop an application which will at the end of the week synchronise the data between the web application's database and the data retrieved from JSON.
I understand the process of pulling data from JSON and I understand how to put that data into the SQL database; what i'm curious about is what is the most efficient or effective way of performing this process?
For example; table one stores 3 records of customers in the database; but at the end of the week; the JSON request shows that 1 customer has been deleted; so one customer will need to be deleted in the database.
Is it suitable to clear all the database entries and then re-insert them into the database with the updated fields? Or is there another way to achieve this effectively using PHP/MySQL? (i.e. placing two tables side by side somehow and comparing the two?)
On the highest level, I am wondering if such a process is possible - or does it even make much of a difference.
If I query lets say 1500 rows and process the data from these. The user is browsing through the data and after about 10 minutes, there is ~10 new rows.
Is it possible to lets say 'cache' the initial queried and processed data instead of re-querying and re-processing all the old data, in order to show the new data.
At the moment, I query and process the data on each page view. I don't like it, it doesn't feel right.
Not really sure how to go about it. Thanks for any help. Have searched a bit now for caching data, but can't find a similar question/answer for this.
Thanks!
Some additional information
Each record queried has a timestamp associated with it in milliseconds. My idea would be that I can store the last record's timestamp, and do a query on the database to see if there is any new records since the last timestamp.
If there is, process just the new records, and add them to the already processed data. This must be a possibility surely, or it surely must be a more efficient way than processing all the data again and again.
Thanks!
I'm trying to minimize the load time with some scripts that I've created. I'm connecting to multiple JSON or XML APIs either through a developer's API resource or through RSS feeds and I'm trying to find a way to only insert new rows into the mysql databases.
I have used the INSERT IGNORE function, and it works great, but the problem is that it still loads all of the different rows and ignores inserting it if the UNIQUE key is a duplicate. What I'd like to do is to create a faster running script that will only attempt to insert a row if it is different.
I have thought about doing an in_array() function to query the unique key in the DB, but eventually, the array will get too big to handle (ie. 1,000,000 records) to see if it's in the array before inserting the unique ones.
Is there a better way to do this? I've done it easily in mysql by using the WHERE clause for records already in the Database, but extracting from an XML API is a little more difficult with this I'm finding taking into consideration how the DBs will grow enormously large.
I am storing some history information on my website for future retrieval of the users. So when they visit certain pages it will record the page that they visited, the time, and then store it under their user id for future additions/retrievals.
So my original plan was to store all of the data in an array, and then serialize/unserialize it on each retrieval and then store it back in a TEXT field in the database. The problem is: I don't know how efficient or inefficient this will get with large arrays of data if the user builds up a history of (e.g.) 10k pages.
EDIT: So I want to know what is the most efficient way to do this? I was also considering just inserting a new row in the database for each and every history, but then this would make a large database for selecting things from.
The question is what is faster/efficient, massive amount of rows in database or massive serialized array? Any other better solutions are obviously welcome. I will eventually be switching to Python, but for now this has to be done in PHP.
There is no benefit to storing the data as serialized arrays. Retrieving a big blob of data, de-serializing, modifying it and re-serializing to update is slow - and worse, will get slower the larger the piece of data (exactly what you're worried about).
Databases are specifically designed to handle large numbers of rows, so use them. You have no extra cost per insert as the data grows, unlike your proposed method, and you're still storing the same amount of data, so let the database do what it does best, and keep your code simple.
Storing the data as an array also makes any sort of querying and aggregation near impossible. If the purpose of the system is to (for example) see how many visits a particular page got, you would have to de-serialize every record, find all the matching pages, etc. If you have the data as a series of rows with user and page, it's a trivial SQL count query.
If, one day, you find that you have so many rows (10,000 is not a lot of rows) that you're starting to see performance issues, find ways to optimize it, perhaps through aggregation and de-normalization.
you can check for session variable and store all data of one session and can dump it together into database.
You can do Indexing at db level to save time.
Last and the most effective thing you can do is to do operation/manipulation on data and store it in separate table.And always select data from manuplated table.You can achieve this using cron job or some schedular.
I need to have a button to fire an action to copy all records from a defined client from one database to another with php.
The template database has 12 tables (diferent rows on each) but all with the row client_id to make the WHERE clausule work properly.
The question is, how do I do this?
Thanks,
Pluda
Since PHP is a Server-side programming language, you can't copy something from the client. You can however upload Data (like XML), parse it and then insert it into your MySQL Database.
If you want to copy records from one to another database, you might want to read from the Database and save them in a format like SQL. Then, you could send those querys to the second Database.
An advise at this point: If you need to make the same Query (with different values) over and over again, you should use a PreparedStatement. It will be compiled in the Database and then just filled out with new values. This is way faster then using an Insert every time.