I'm developing a Wordpress plugin in php 7.4 that pulls data from an API and writes it into the WP database. This data is used later by the Custom Post Types and Advanced Custom Fields plugins (mentioning this just in case it's relevant) that present it to the user via different pages on a website.
The problem I'm facing is that with big amounts of data, the script that loads that data (called via Ajax) will just crash at some point and return a generic error, and after lots of researching and testing and talking to the hosting the only conclusion is that what's happening is that the script is running out of memory, even though we're giving it as much as we possibly can. When the script loads smaller amounts of data, it works perfectly fine, supporting the lack-of-memory theory.
Since I'd rather optimize the code than having someone pay more for hosting, I've been trying different strategies to do that, but I can't seem to find one that makes a significant impact, so I was hoping to get some ideas.
Something to know about the API and the process that's currently done when loading data: the data that's pulled every time is a refresh of pre-existing data (think a bunch of multiple records), so most of the data already exists in the database. The problem is that, because of how the API is implemented, there is NO WAY to know which data has changed and which hasn't (that's been thoroughly discussed with the API provider).
Things that I'm already doing to optimize the script:
Compare records coming from the API to records already existing in the database, so that I only save/update the new ones (I do this via an in_array() comparison of the ID of the record vs the IDs of the existing records)
Fetch from the API only the strictly necessary fields from every record
Skip native WP functions that store data to the database whenever possible, using custom functions to directly write to database, to avoid WP performance overhead
This doesn't seem to be enough for big amounts of data. Fetching the data itself seems not to be the problem regardless of the size, it's the processing of the data and entering it into the database. So I guess I'm looking for strategies that would help optimizing the processing of big chunks of data in this situation.
EDIT:
Clarifying this being an AJAX script: the PHP script (the API data importer) that gets called via AJAX gets also called via CRON by a different process, and fails as well when dealing with big amounts of data.
Related
I have a Wordpress site that utilizes a custom post type, call it CPT-1, that I created using JetEngine. Inside of CPT-1 are meta fields. Once that was setup, I did a bulk insert of data using Ultimate CSV Importer Pro and it put this information into CPT-1 and I could put each column of data into the meta fields I wanted to use. These fields are then used later in tables.
Is there a way to go around the CSV Importer part of this process and just pull from a database? In the long term, i'd like to make changes to certain posts and upload different posts while using CPT-1 but I don't think using a CSV every time will be easy or accurate. If I could just pull from a database that I make updates to, I can track those changes easily and manage it.
I have database experience but not so much with Wordpress databases. What tables would I have to pay attention to if I were to go down this route?
Wordpress uses MySQL as a backend, so there is no reason you can't just insert the data directly. You'll need to get the credentials Wordpress uses to connect to the database, and then connect yourself, probably from your own custom PHP script.
I am generally skiddish doing things like you described because Wordpress is a complex piece of software and I don't have a lot of awareness of what it is doing behind-the-scenes (nor do they really intend users to have such awareness, most functionality is hidden from the user.)
However, if you have been doing a CSV import, and you have tested it extensively, and it's working fine with that method, there is no reason you couldn't carry out this same thing with less manual work on your part via a PHP script.
I'm afraid I can't get much more specific in my answer because I don't have information about what exactly you did with the CSV.
A straightforward (but not super efficient) way of doing this would be a PHP script where you initiate a database connection to the database you update, and a second connection to the MySQL database, fetch a query of whatever rows you want to update (whatever you would normally be exporting via CSV) and iterate row-by-row and insert this data into the MySQL database. You can make this significantly more efficient by making a single prepared statement, and then executing it repeatedly with each row of values.
A more efficient way of doing it would be to pull the data into your PHP script and then format it as a single query which you could then add into the MySQL database.
If you already have CSV importing working, you could even do a "lazy" solution where you just write a PHP script that generates a CSV and then feeds it into MySQL and imports it the same way your other program was. It's hard for me to tell from what you said, which of these solutions would work. However, I have used all three solutions, depending on what I'm doing and what kind of error handling I want.
In general, if errors happen rarely-to-never you are probably better off with the single, bulk insert methods whether one query or PHP automating the export of a CSV and then passing it to be imported into Wordpress' MySQL database.
I am developing a plugin for wordpress that imports a boatload of data via multiple API calls and saves it as products on woocommerce.
Problem is when a store has hundreds (if not thousands) of products it starts taking a toll on the length of the importing job, resulting in a large variety of timeouts. Extending the import with command like "set_time_limit(xx)" works but some servers still appear to have their own fail-safes that I don't think I can bypass with a line of code:
mod_zfpm(63616:7f14fca1b730:0)-F030E35B: stderr: Timeout (900s) exceeded while reading from socket (application) (fastcgi.c:941)'
I am trying to figure out which method is the most correct.
So far the options I have thought of are:
use "register_shutdown_function()" on error to relaunch the import (probably a very bad idea)
divide the job into a chain of small cron jobs (safer alternative but probably time-consuming and convoluted)
Should I go with option 2 or are there better ways of handling a very long running tasks?
NOTE: Since its a plugin for WP, I cannot employ the solutions suggested in many of the duplicate threads, as the plugin will be used on many different servers.
I am still new to php also but i once came across something like this but the situation was not exactly the same, the solution used was
1) putting the data in JSON format.
2) it was an sql data base the requests were made with Ajax
3) the requests were made in sets depending on the amount of data that needed to be displayed
4) an extra perimeter in the request caused it to return all data with an offset then if data is returned offset is increased and another request is made and second set of returned data is appended to the first set
5) while the the array of data is update asynchronously a JavaScript event listener is attached to reload display when data change's
While this answer won't solve your problem i hope it gives you an idea.
I've got a heavy-read website associated to a MySQL database. I also have some little "auxiliary" information (fits in an array of 30-40 elements as of now), hierarchically organized and yet gets periodically and slowly updated 4-5 times per year. It's not a configuration file though since this information is about the subject of the website and not about its functioning, but still kind of a configuration file. Until now, I just used a static PHP file containing an array of info, but now I need a way to update it via a backend CMS from my admin panel.
I thought of a simple CMS that allows the admin to create/edit/delete entries, periodical rare job, and then creates a static JSON file to be used by the page building scripts instead of pulling this information from the db.
The question is: given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
I just used a static PHP
This sounds like contradiction to me. Either static, or PHP.
given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
Cache was invented for a reason :) Same with your case - it all depends on how often data changes vs how often is read. If data changes once a day and remains static for 100k downloads during the day, then not caching it or not serving from flat file would would simply be stupid. If data changes once a day and you have 20 reads per day average, then perhaps returning the data from code on each request would be less stupid, but from other hand, all these 19 requests could be served from cache anyway, so... If you can, serve from flat file.
Caching is your best option, Redis or Memcached are common excellent choices. For flat-file or database, it's hard to know because the SQL schema you're using, (as in, how many columns, what are the datatype definitions, how many foreign keys and indexes, etc.) you are using.
SQL is about relational data, if you have non-relational data, you don't really have a reason to use SQL. Most people are now switching to NoSQL databases to handle this since modifying SQL databases after the fact is a huge pain.
I am creating a record system for my site which will track users and how they interact with my site's pages. This system will record button clicks, page view times, and the method used to navigate away from a page (among other things.) I an considering one of two options:
create a log file and append a string to it for each action.
create a database table and save entries based on user interaction.
Although I am sure that both methods could easily fill my needs, which would be better in the long run. Other considerations:
General page viewing will never cause this data to be read (only added to it.)
Old Data should be archived, but still accessible.
Data will be viewed and searched via web app
As with most performance questions, the answer is 'It depends.'
I would expect it depends on the file system, media type, and operating system of your server.
I don't believe I've ever experienced performance differences INSERTing data into a large, or a small MySQL database. The performance differences manifest when you retrieve that data. The database will almost always outperform queries to files, especially when you want complex or statistical data.
If you are only concerned with the speed of inserting/appending data, and expect a large amount of traffic, build a mock environment and benchmark each approach. If you want to have any amount of speed retrieving that data in a structured way, go with the database.
If you want performance you should inspect the server log, instead of trying to build your log system...
Well this is kind of a question of how to design a website which uses less resources than normal websites. Mobile optimized as well.
Here it goes: I was about to display a specific overview of e.g. 5 posts (from e.g. a blog). Then if I'd click for example on the first post, I'd load this post in a new window. But instead of connecting to the Database again and getting this specific post with the specific id, I'd just look up that post (in PHP) in my array of 5 posts, that I've created earlier, when I fetched the website for the first time.
Would it save data to download? Because PHP works server-side as well, so that's why I'm not sure.
Ok, I'll explain again:
Method 1:
User connects to my website
5 Posts become displayed & saved to an array (with all its data)
User clicks on the first Post and expects more Information about this post.
My program looks up the post in my array and displays it.
Method 2:
User connects to my website
5 Posts become displayed
User clicks on the first Post and expects more Information about this post.
My program connects to MySQL again and fetches the post from the server.
First off, this sounds like a case of premature optimization. I would not start caching anything outside of the database until measurements prove that it's a wise thing to do. Caching takes your focus away from the core task at hand, and introduces complexity.
If you do want to keep DB results in memory, just using an array allocated in a PHP-processed HTTP request will not be sufficient. Once the page is processed, memory allocated at that scope is no longer available.
You could certainly put the results in SESSION scope. The advantage of saving some DB results in the SESSION is that you avoid DB round trips. Disadvantages include the increased complexity to program the solution, use of memory in the web server for data that may never be accessed, and increased initial load in the DB to retrieve the extra pages that may or may not every be requested by the user.
If DB performance, after measurement, really is causing you to miss your performance objectives you can use a well-proven caching system such as memcached to keep frequently accessed data in the web server's (or dedicated cache server's) memory.
Final note: You say
PHP works server-side as well
That's not accurate. PHP works server-side only.
Have you think in saving the posts in divs, and only make it visible when the user click somewhere? Here how to do that.
Put some sort of cache between your code and the database.
So your code will look like
if(isPostInCache()) {
loadPostFromCache();
} else {
loadPostFromDatabase();
}
Go for some caching system, the web is full of them. You can use memcached or a static caching you can made by yourself (i.e. save post in txt files on the server)
To me, this is a little more inefficient than making a 2nd call to the database and here is why.
The first query should only be pulling the fields you want like: title, author, date. The content of the post maybe a heavy query, so I'd exclude that (you can pull a teaser if you'd like).
Then if the user wants the details of the post, i would then query for the content with an indexed key column.
That way you're not pulling content for 5 posts that may never been seen.
If your PHP code is constantly re-connecting to the database you've configured it wrong and aren't using connection pooling properly. The execution time of a query should be a few milliseconds at most if you've got your stack properly tuned. Do not cache unless you absolutely have to.
What you're advocating here is side-stepping a serious problem. Database queries should be effortless provided your database is properly configured. Fix that issue and you won't need to go down the caching road.
Saving data from one request to the other is a broken design and if not done perfectly could lead to embarrassing data bleed situations where one user is seeing content intended for another. This is why caching is an option usually pursued after all other avenues have been exhausted.