BigCommerce PHP API - Pulling large amounts of data

BigCommerce PHP API - Pulling large amounts of data - php

I'm attempting a couple different data pulls using the BigCommerce PHP API.
In one attempt, I need to pull all of my customers and their addresses. In the other, I need to pull all of my orders and the coupon data (if any) associated with them.
The problem I'm having is a combination of the way BigCommerce returns the data, and with the amount of data I am attempting to pull.
When I pull a list of customers the address data is not stored with the results. Instead I have to query a separate JSON file. Example:
https://STORE-ID.mybigcommerce.com/api/v2/customers/2104/addresses.json
According to the quickstarts and the response from their API team, they expect me to simply iterate through each customer/order ID and then make an additional request to pull the address/coupon data for each ID.
Due to the amount of data I have here, this operation results in either script timeouts (30+ seconds), or in PHP running out of memory.
Yes, I know the general solution with PHP is throw more hardware at it, but there has to be a more efficient way to do this than to simply make a ton of single-shot long requests, right?
I'm thinking something in the way of multiple threads or jobs, though I personally am not aware of any such functionality.

As suggested by Chirag B, I ended up using Node.JS and splitting this into multiple async calls.

Related

Question about storing/retrieving data for a complex page

My stack is php and mysql.
I am trying to design a page to display details of a mutual fund.
Data for a single fund is distributed over 15-20 different tables.
Currently, my front-end is a brute-force php page that queries/joins these tables using 8 different queries for a single scheme. It's messy and poor performing.
I am considering alternatives. Good thing is that the data changes only once a day, so I can do some preprocessing.
An option that I am considering is to create run these queries for every fund (about 2000 funds) and create a complex json object for each of them, store it in mysql indexed for the fund code, retrieve the json at run time and show the data. I am thinking of using the simple json_object() mysql function to create the json, and json_decode in php to get the values for display. Is this a good approach?
I was tempted to store them in a separate MongoDB store - would that be an overkill for this?
Any other suggestion?
Thanks much!

To meet your objective of quick pageviews, your overnight-run approach is very good. You could generate JSON objects with your distilled data, or even prerendered HTML pages, and store them.
You can certainly store JSON objects in MySQL columns. If you don't need the database server to search the objects, simply use TEXT (or LONGTEXT) data types to store them.
To my way of thinking, adding a new type of server (mongodb) to your operations to store a few thousand JSON objects does not seem worth the the trouble. If you find it necessary to search the contents of your JSON objects, another type of server might be useful, however.
Other things to consider:
Optimize your SQL queries. Read up: https://use-the-index-luke.com and other sources of good info. Consider your queries one-by-one starting with the slowest one. Use the EXPLAIN or even the EXPLAIN ANALYZE command to get your MySQL server to tell you how it plans each query. And judiciously add indexes. Using the query-optimization tag here on StackOverflow, you can get help. Many queries can be optimized by adding indexes to MySQL without changing anything in your php code or your data. So this can be an ongoing project rather than a big new software release.
Consider measuring your query times. You can do this with MySQL's slow query log. The point of this is to identify your "dirty dozen" slowest queries in a particular time period. Then, see step one.
Make your pages fill up progressively, to keep your users busy reading while you get the data they need. Put the toplevel stuff (fund name, etc) in server-side HTML so search engines can see it. Use some sort of front-end tech (React, maybe, or Datatables that fetch data via AJAX) to render your pages client-side, and provide REST endpoints on your server to get the data, in JSON format, for each data block in the page.
In your overnight run create a sitemap file along with your JSON data rows. That lets you control exactly how you want search engines to present your data.

What is the best approach when working with a REST API and big datasets

I have a data provider(REST Api) that stores info about 400-500k items that gets updated daily. The API methods I can call returns info for 1000 items only (but I have a pagination mechanism so i can loop through all data) .
I'm working with PHP/MySQL and my task is to check a website database(containing 10k to 100k items) against the data provided by this API. All I need to do is to check that the item ID from the website database is present in the provider database. If not, I will delete the record from the website database.
What will be the best method to do this daily ?
Should I first do a loop, get all the data from the data provider and store that into a file ? (considering it is 400-500k ids I don't think an array will do ) Then check each ID from the local database against that file ?

I would refer to the "Rules Of Optimization Club" - specifically rules 1 and 2:
You do not optimize.
You do not optimize, without measuring first.
So build a solution that works with what you think of first. Then measure how it performs. If it performs badly, see what parts of it is slow (server responses / saving data / looping through data) and only then start to think about optimization.
This is specifically in response to "considering it is 400-500k ids I don't think an array will do" -- did you try and did it fail?

MySQL or JSON for data retrieval

So, I have situation and I need second opinion. I have database and it' s working great with all foreign keys, indexes and stuff, but, when I reach certain amount of visitors, around 700-800 co-current visitors, my server hits bottle neck and displays "Service temporarily unavailable." So, I had and idea, what if I pull data from JSON instead of database. I mean, I would still update database, but on each update I would regenerate JSON file and pull data from it to show on my homepage. That way I would not press my CPU to hard and I would be able to make some kind of cache on user-end.

What you are describing is caching.
Yes, it's a common optimization to avoid over-burdening your database with query load.
The idea is you store a copy of data you had fetched from the database, and you hold it in some form that is quick to access on the application end. You could store it in RAM, or in a JSON file. Some people operate a Memcached or Redis in-memory database as a shared resource, so your app can run many processes or threads that access the same copy of data in RAM.
It's typical that your app reads some given data many times for every single time it updates the data. The greater this ratio of reads to writes, the better the savings in terms of lightening the load on your database.
It can be tricky, however, to keep the data in cache in sync with the most recent changes in the database. In other words, how do all the cache copies know when they should re-fetch the data from the database?
There's an old joke about this:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton

So after another few days of exploring and trying to get the right answer this is what I have done. I decided to create another table, instead of JSON, and put all data, that was suposed to go in JSON file, in the table.
WHY?
Number one reason is MySQL has ability to lock tables while they're being updated, JSON has not.
Number two is that I will downgrade from few dozens of queries to just one, simplest, query: SELECT * FROM table.
Number three is that I have better control over content this way.
Number four, while I was searching for answer I found out that some people had issues with JSON availability if a lot of co-current connections were making request for same JSON, I would never have a problem with availability.

Redis - Best data structure to store and then fetch large data

I've recently implemented Redis into one of my Laravel projects. It's currently more of an technical exercise as opposed to production as I want to see what it's capable of.
What I've done is created a list of payment transactions. What I'm pushing to the list is the payload which I receive from a webhook every time a transaction is processed. The payload is essentially an object containing all the information to do with that particular transaction.
I've created a VueJS frontend that then displays all the data in a table and has pagination so it's show 10 rows at a time.
Initially this was working super quick but now that the list contains 30,000 rows which is about 11MB worth of data, the request is taking about 11seconds.
I think the issue here is that I'm using a list and am fetching all the rows from the list using LRANGE.
The reason I used a list was because it has the LPUSH command so that latest transactions go to the start of the list.
I decided to do a test where I got all the data from the list and outputted the value to a blank page and this took about the same time so it's not an issue with Vue, Axios, etc.
Firslty, is this read speed normal? I've always heard that Redis is blazing fast.
Secondly, is there a better way to increase read performance when using Redis?
Thirdly, am I using the wrong data type?
In time I need to be able to store 1m rows of data.

As I realized you get all 30,000 rows in any transaction update and then paginate it in frontend. In my opinion, the true strategy is getting lighter data packs in each request.
For example, use Laravel pagination in response to your request.

In my opinion:
Firstly: As you know, Redis is blazing fast and Redis is really fast. Because Redis data always in memory, you say read 11MB data about use 11s, you can check your bandwidth
Secondly: I'm sorry I don't know how to increase in this env.
Thirdly: I think your choice ok.
So, you can check your bandwidth first(redis server).

API autocomplete call (performance, etc...)

I'm just starting to plan out a web app which allows users to save information about movies. This relies on the TMDb API. Now, i'd like to include an autocomplete feature for when they are searching for a movie. Do you think it's wise to make an API call onKeyUp? Or wait for a certain amount of time after a keyUp? Overall, is this the best way to carry this out?
I will be using PHP, jQuery and saving/retrieving user data with MySQL

Delay after key up unless you (or the server you are hitting) has the speed to be able to handle it. You'll have to account for race conditions anyway, but having that many calls isn't going to be very helpful. Your speeds to query the API are going to be slower than most user's typing speed, which means you'll be making unnecessary calls to your api, using both yours and your users' bandwidth.
Also, I would set a minimum number of character entered before you query (probably ~3 is good). This will also help lower number of queries, and you won't be running a query for 'a' or even 'ap' which could both be a lot of things. Once you get to 3 ('app') you can get a much smaller list of results, which is more helpful for a user.

You could use the TypeWatch jQuery plugin or something similar to only call the API after the user has stopped typing for a certain amount of time. Stack Overflow's Tags and Users pages, for example, use TypeWatch to only process the input after the user has stopped typing for 500ms.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.