Long running Script - php

I am developing a plugin for wordpress that imports a boatload of data via multiple API calls and saves it as products on woocommerce.
Problem is when a store has hundreds (if not thousands) of products it starts taking a toll on the length of the importing job, resulting in a large variety of timeouts. Extending the import with command like "set_time_limit(xx)" works but some servers still appear to have their own fail-safes that I don't think I can bypass with a line of code:
mod_zfpm(63616:7f14fca1b730:0)-F030E35B: stderr: Timeout (900s) exceeded while reading from socket (application) (fastcgi.c:941)'
I am trying to figure out which method is the most correct.
So far the options I have thought of are:
use "register_shutdown_function()" on error to relaunch the import (probably a very bad idea)
divide the job into a chain of small cron jobs (safer alternative but probably time-consuming and convoluted)
Should I go with option 2 or are there better ways of handling a very long running tasks?
NOTE: Since its a plugin for WP, I cannot employ the solutions suggested in many of the duplicate threads, as the plugin will be used on many different servers.

I am still new to php also but i once came across something like this but the situation was not exactly the same, the solution used was
1) putting the data in JSON format.
2) it was an sql data base the requests were made with Ajax
3) the requests were made in sets depending on the amount of data that needed to be displayed
4) an extra perimeter in the request caused it to return all data with an offset then if data is returned offset is increased and another request is made and second set of returned data is appended to the first set
5) while the the array of data is update asynchronously a JavaScript event listener is attached to reload display when data change's
While this answer won't solve your problem i hope it gives you an idea.

Related

How to optimize writing lots of data to database in WP plugin

I'm developing a Wordpress plugin in php 7.4 that pulls data from an API and writes it into the WP database. This data is used later by the Custom Post Types and Advanced Custom Fields plugins (mentioning this just in case it's relevant) that present it to the user via different pages on a website.
The problem I'm facing is that with big amounts of data, the script that loads that data (called via Ajax) will just crash at some point and return a generic error, and after lots of researching and testing and talking to the hosting the only conclusion is that what's happening is that the script is running out of memory, even though we're giving it as much as we possibly can. When the script loads smaller amounts of data, it works perfectly fine, supporting the lack-of-memory theory.
Since I'd rather optimize the code than having someone pay more for hosting, I've been trying different strategies to do that, but I can't seem to find one that makes a significant impact, so I was hoping to get some ideas.
Something to know about the API and the process that's currently done when loading data: the data that's pulled every time is a refresh of pre-existing data (think a bunch of multiple records), so most of the data already exists in the database. The problem is that, because of how the API is implemented, there is NO WAY to know which data has changed and which hasn't (that's been thoroughly discussed with the API provider).
Things that I'm already doing to optimize the script:
Compare records coming from the API to records already existing in the database, so that I only save/update the new ones (I do this via an in_array() comparison of the ID of the record vs the IDs of the existing records)
Fetch from the API only the strictly necessary fields from every record
Skip native WP functions that store data to the database whenever possible, using custom functions to directly write to database, to avoid WP performance overhead
This doesn't seem to be enough for big amounts of data. Fetching the data itself seems not to be the problem regardless of the size, it's the processing of the data and entering it into the database. So I guess I'm looking for strategies that would help optimizing the processing of big chunks of data in this situation.
EDIT:
Clarifying this being an AJAX script: the PHP script (the API data importer) that gets called via AJAX gets also called via CRON by a different process, and fails as well when dealing with big amounts of data.

Strange timeout like error on production server (Network Error(tcp_error))

I have an application running for my company that uses very slow scripts. One of the pages used for this is running a SQL query for about 5 minutes then run PHP in about 20 minutes.
After this delay the server sends me an error that says :
Network Error (tcp_error)
- 503 error - A communication error occurred: ""
I have already tried to solve the problem by increasing the PHP max execution time without success.
If you want access to the code, I can give it but the code isn't really easy to understand.
Do you know how could I fix this error?
I guess that the problem is a computation-heavy or the script processes lots of database data so try some of these approaches:
Try to paginate the data fetching, try to fetch the data into smaller chunks.
Of not possible try to hide and move the computation somewhere else:
Load the html first and then load the data via ajax
Try to process the data in a database procedure (quessing mysql, look on https://dev.mysql.com/doc/refman/5.7/en/create-procedure.html)
Make a service with asynchronous communication for example a script the communicates with rabbit mq. Process the data on the service and send it back to the main application. In case of php you may need another application that supports websockets and node.js for this part.
Try to cache data that is not computed so often. Eg. when fetching some data after processing it cache the result in a mongodb so if not affected fetch it first from mongo and do not recompute them.
Consider precomputing as much data and store them either in a relational database (mysql postgresql) of a non relational one (mongodb, couchbase).
Also ins some application parts try to use other technologies beyond relational databases in order to make them efficient (eg. elastic search for searching or neo4j for mapping relations).
Try to split the computation in smaller chunks and either via db triggers during write and update execute them. An alternate approach is when writing either via using services or on the write logic itself to do the computation ins smaller parts.
Also BEFORE doing anything above try to analyze where the code slows down how much does it take to process the data and how much data are being fetched.
Furthermore in case of a background process execution you need some extra data and a way to keep track of the process. Also you can run the following as well:
<?php
system("php someheavyprocessingscript.php &");
//Load Html do some stuff here
?>
Via system and the ending & on the executing command you run the process to the background. Usually I would prefer more like a service approach and via some websockets to fetch the result to the frontend via a RabbitMq communication.
Also consider googling/duckduckgoing the following keywords: xdebug-profiler, service-oriented architecture it will give you some points on how to solve it + some extra knowledge.

MySQL & webpage: Notification on insert in table

I have an application on a server that monitors a log file, I've also added a view at the client side (in the form of a website). Now I would like to implement the following: Whenever a new entry has been added, the view should update as fast as possible.
First I have thought of two practical solutions:
1) Call an AJAX function that requests a php page every second, which checks for updates, and if so show's them. (Disadvantages: Lots of HTTP overhead, a lot of the time there may be no message, lots of SQL calls)
2) Call an AJAX function that requests a different php page every minute, which also checks for updates for 1 minute, but only returns if it has found an update, or else after 1 minute. (Disadvantages: HTTP overhead, but less as option 1, may still have times without message, still a lot of SQL calls)
Which of those ones would be better, or what alternative would you advice?
I have also thought of yet another solution, but I'm unsure of how to implement it. That would be that on every INSERT on a specific table in the MySQL database, the webpage would directly be notified, perhaps via a push connection, but I'm also unsure of how those work.

Atomic/safe serving of single-use codes

I have a list of single-use discount codes for an ecommerce site I'm partnering with. I need to set up a page on my site where my users can fill out a form, and then will be given one of the codes. The codes are pre-determined and have been sent to me in a text file; I can't just generate them on the fly. I need to figure out the best way to get an unused code from the list, and then remove it from the list (or update a flag to mark it as used) at the same time, to avoid any possibility of giving two people the same code. In other words, something similar to a queue, where I can remove one item from the queue atomically.
This webapp will be running on AWS and the current code is Python (though I could potentially use something else if necessary; PHP would be easy). Ideally I'd use one of the AWS services or mysql to do this, but I'm open to other solutions if they're not a royal pain to get integrated. Since I thought "queue," SQS popped into my head, but this is clearly not what it's intended for (e.g. the 14 day limit on messages remaining in the queue will definitely not work for me). While I'm expecting very modest traffic (which means even really hacky solutions would probably work), I'd rather learn about the RIGHT way to do this even at scale.
I cant give actual code examples, but one of the easiest ways to do it would just be an increment counter in the file, so something like
0
code1
code2
code3
etc
and just skipping that many lines every time a code is used.
You could also do this pretty simply in a database
Amazon DynamoDB is a fast, NoSQL database from AWS, and it is potentially a good fit for this use case. Setting up a database table is easy, and you could load your codes into there. DynamoDB has a DeleteItem operation that also allows you to retrieve the data within the same, atomic operation (by setting the ReturnValues parameter to ALL_OLD). This would allow you to get and delete a code in one shot, so no other requests/processes can get the same code. AWS publishes official SDKs to help you connect to and use their services, including both a Python and PHP SDK (see http://aws.amazon.com/tools/).

Getting all data once for future use

Well this is kind of a question of how to design a website which uses less resources than normal websites. Mobile optimized as well.
Here it goes: I was about to display a specific overview of e.g. 5 posts (from e.g. a blog). Then if I'd click for example on the first post, I'd load this post in a new window. But instead of connecting to the Database again and getting this specific post with the specific id, I'd just look up that post (in PHP) in my array of 5 posts, that I've created earlier, when I fetched the website for the first time.
Would it save data to download? Because PHP works server-side as well, so that's why I'm not sure.
Ok, I'll explain again:
Method 1:
User connects to my website
5 Posts become displayed & saved to an array (with all its data)
User clicks on the first Post and expects more Information about this post.
My program looks up the post in my array and displays it.
Method 2:
User connects to my website
5 Posts become displayed
User clicks on the first Post and expects more Information about this post.
My program connects to MySQL again and fetches the post from the server.
First off, this sounds like a case of premature optimization. I would not start caching anything outside of the database until measurements prove that it's a wise thing to do. Caching takes your focus away from the core task at hand, and introduces complexity.
If you do want to keep DB results in memory, just using an array allocated in a PHP-processed HTTP request will not be sufficient. Once the page is processed, memory allocated at that scope is no longer available.
You could certainly put the results in SESSION scope. The advantage of saving some DB results in the SESSION is that you avoid DB round trips. Disadvantages include the increased complexity to program the solution, use of memory in the web server for data that may never be accessed, and increased initial load in the DB to retrieve the extra pages that may or may not every be requested by the user.
If DB performance, after measurement, really is causing you to miss your performance objectives you can use a well-proven caching system such as memcached to keep frequently accessed data in the web server's (or dedicated cache server's) memory.
Final note: You say
PHP works server-side as well
That's not accurate. PHP works server-side only.
Have you think in saving the posts in divs, and only make it visible when the user click somewhere? Here how to do that.
Put some sort of cache between your code and the database.
So your code will look like
if(isPostInCache()) {
loadPostFromCache();
} else {
loadPostFromDatabase();
}
Go for some caching system, the web is full of them. You can use memcached or a static caching you can made by yourself (i.e. save post in txt files on the server)
To me, this is a little more inefficient than making a 2nd call to the database and here is why.
The first query should only be pulling the fields you want like: title, author, date. The content of the post maybe a heavy query, so I'd exclude that (you can pull a teaser if you'd like).
Then if the user wants the details of the post, i would then query for the content with an indexed key column.
That way you're not pulling content for 5 posts that may never been seen.
If your PHP code is constantly re-connecting to the database you've configured it wrong and aren't using connection pooling properly. The execution time of a query should be a few milliseconds at most if you've got your stack properly tuned. Do not cache unless you absolutely have to.
What you're advocating here is side-stepping a serious problem. Database queries should be effortless provided your database is properly configured. Fix that issue and you won't need to go down the caching road.
Saving data from one request to the other is a broken design and if not done perfectly could lead to embarrassing data bleed situations where one user is seeing content intended for another. This is why caching is an option usually pursued after all other avenues have been exhausted.

Categories