Let say I have a page with 100 objects and each page is around 700 bytes when converted to json.
In order to save the objects to the php based controller I have the following options.
Option 1
For each objects (100 objects) do the following
Take object definition
2 .convert to json
Do http post to the php controller
the php controller saves it to a file or database.
Option 2
Variable bigJsonString;
For each objects (100 objects) do the following
Take object definition
2 .convert to json
append the json to a string variable "bigJsonString" with a delimitter to indicate end of object.
After the big fat bigJsonString is constructed
Do http post to the php controller by sending "bigJsonString"
the php controller saves it to a file or database.
In option 1, I am doing 100 http posts one after another. Does this raise any alarms?
Is this normal for web applications doing ajax post?
The second option seems safe but then the only concern is when the 100 objects become say 500 objects or to a point where the "bigJsonString" goes several Megabytes long.
The third option we can introduce is a hybrid of option 1 and 2 where we start by constructing the "bigJsonString" and if the length goes to a certain limit then do a ajax post. Flush the string and build the string again for remaining objects.
What are the pitfalls and what is the normal or standard practice. if someone can point to resources where this is already analysed, that would be great.
Thanks very much.
Browsers generally limit the number of connections to a single domain to a low number (under 20 by default for most browsers). In the meantime, many of your requests will block.
On the other hand, larger requests will take longer to fully process because there are less opportunities for parallelization.
Ideally, you would try both methods and see which one works most effectively.
A note: for the second method, you could create an array of the objects then serialize the array as JSON, instead of manually dealing with the JSON string. (JSON supports arrays, not just objects!)
I guess its situational. However, I don't see any situation in which sending 100 requests to the server all at one time (per se) is good.
Personally, I'd just push each object to a javascript array and send a JSON representation of the array to PHP, that way you don't have to worry about delimiters.
It depends how fast you are posting the objects to the server. If the json objects are being posted say every second, 1 object per post isn't too bad. But if you are posting 100 in a second, you really need to build up a large request.
There will be significant lag for each request. Building a large multi object json string is preferable in terms of performance.
What if there is an error in one of the objects? You will need to make sure it doesn't dump stop processing of all the other objects or the user will have to upload all that data again.
If you do multiple requests, you can give better user feedback client side since you know exactly where you are in the queue of objects.
It is up to you to balance all this.
Good luck.
Go for option 2, bigJsonString. You should have no trouble passing messages that are several megabytes long - the same infrastructure is used to pass much larger html, image, style, script and video files over the internets.
Related
The application i am working on needs to obtain dataset of around 10mb maximum two times a hour. We use that dataset to display paginated results on the site also simple search by one of the object properties should also be possible.
Currently we are thinking about 2 different ways to implement this
1.) Store the json dataset in the database or a file in the file system, read that and loop over to display results whenever we need.
2.) Store the json dataset in relational MySQL table and query the results and loop over whenever we need to display them.
Replacing/Refreshing the results has to be done multiple times per hour as i said.
Both ways have cons. I am trying to choose a good way which is less evil overall. Reading 10 MB in memory is not a lot and on the other hand rewriting a table few times a hour could produce conflicts in my opinion.
My concern regarding 1.) is how safe the app will be if we read 10mb in the memory all the time? What will happen if multiple users do this at some point of time, is this something to worry about or PHP is able to handle this in background?
What do you think it will be best for this use case?
Thanks!
When php runs on a web server (as it usually does) the server starts new php processes on demand when they're needed to handle concurrent requests. A powerful web server may allow fifty or so php processes. If each of them is handling this large data set, you'll need to have enough RAM for fifty copies. And, you'll need to load that data somehow for each new request. Reading 10mb from a file is not an overwhelming burden unless you have some sort of parsing to do. But it is a burden.
As it starts to handle each request, php offers a clean context to the programming environment. php is not good at maintaining in-RAM context from one request to the next. You may be able to figure out how to do it, but it's a dodgy solution. If you're running on a server that's shared with other web applications -- especially applications you don't trust -- you should not attempt to do this; the other applications will have access to your in-RAM data.
You can control the concurrent processes with Apache or nginx configuration settings, and restrict it to five or ten copies of php. But if you have a lot of incoming requests, those requests get serialized and they will slow down.
Will this application need to scale up? Will you eventually need a pool of web servers to handle all your requests? If so, the in-RAM solution looks worse.
Does your json data look like a big array of objects? Do most of the objects in that array have the same elements as each other? If so, that's conformable to a SQL table? You can make a table in which the columns correspond to the elements of your object. Then you can use SQL to avoid touching every row -- every element of each array -- every time you display or update data.
(The same sort of logic applies to Mongo, Redis, and other ways of storing your data.)
We have an application that calls an API every 4 hours and gets a dump of all objects, returned in a json format which are then stored in a file.json
The reason we do this is because we need up to date data and we are not allowed to use the api directly to get small portions of this data and also that we need to do a clean up on it.
There is also another problem, we can't call for only the updated records (which is actually what we need)
The way we are currently handling this is by getting the data, storing in a file, load the previous file into memory and compare the values in order to get only the new and the updated ones, once we get the new and updated we go ahead and insert into MySQL
I am currently looking into a different option, what I was thinking is that since since the new file will contain every single record why not query for the needed objects from the file.json when needed?
The problem with that is that some of these files are larger than 50MB (each file contains one of the related tables, being 6 files in total which complete the full relation) and we can't be loading them into memory every time there is a query, does any one know of a DB system that will allow to query on a file or an easier way to replace the old data with the new one with a quick operation?
I think the approach you're using already is probably the most practical, but I'm intrigued by your idea of searching the JSON file directly.
Here's how I'd take a stab at implementing this, having worked on a Web application that used the similar approach of searching an XML file on disk rather than a database (and, remarkably, was still fast enough for production use):
Sort the JSON data first. Creating a new master file with the objects reordered to match how they're indexed in the database will maximize the efficiency of a linear search through the data.
Use a streaming JSON parser for searches. This will allow the file to be parsed object-by-object without needing to load the entire document in memory first. If the file is sorted, only half the document on average will need to be parsed for each lookup.
Streaming JSON parsers are rare, but they exist. Salsify has created one for PHP.
Benchmark searching the file directly using the above two strategies. You may discover this is enough to make the application usable, especially if it supports only a small number of users. If not:
Build separate indices on disk. Instead of having the application search the entire JSON file directly, parse it once when it's received and create one or more index files that associate key values with byte offsets into the original file. The application can then search a (much smaller) index file for the object it needs; once it retrieves the matching offset, it can seek immediately to the corresponding JSON object in the master file and parse it directly.
Consider using a more efficient data format. JSON is lightweight, but there may be better options. You might experiment with
generating a new master file using serialize to output a "frozen" representation of each parsed JSON object in PHP's native serialization format. The application can then use unserialize to obtain an array or object it can use immediately.
Combining this with the use of index files, especially if they're generated as trees rather than lists, will probably give you about the best performance you can hope for from a simple, purely filesystem-based solution.
I ended up doing my own processing method.
I got a json dump of all records which I then processed into single files with each one having all its related records in it, kind of like a join, to avoid the indexing of these files to be long I created multiple subfolders for a block of records, while creating these files I started building an index files which pointed to the directory location of the record which is a tiny file, now every time there is a query I just load the index file into memory which is under 1 MB I then check if the index key exists which is the master key of the record, if it does I then have the location of the file which I then load into memory and has all the required information to use in the application.
The query for these files ended up being a lot faster than querying the DB which works for what we need.
Thank you all for your input as it helped me decide which way to go.
I'm rewriting a data visualisation web tool which was written 3 years ago. Since that time, JavaScript engine of browser have become way faster so i was thinking to transfer part of the job from server to client.
On the page, data is visualized in a table and in a map (or chart), it's using the same data, but in a different way so the two algorithm to prepare the data for display are different.
Before at every interaction of the user with the data dropdown selectors (3 main + 2sub depending on the 3 main), 3 ajax request were sent, php doing all the work and sending back only necesary data (in html for the table/xml for the chart) very tiny response, no performance issue and javascript was appending response and doing not much more than chasing change events.
So performance was ok but at every single change of criteria user has to wait for ajax response :/
Now my idea is to send back a json object in one ajax request, only at every change of the main 3 criteria combination and then have javascript populating the data in the table and the chart/map on ajaxsuccess and then also on change of the 2 sub criteria.
My hesitation concerns the structure of the json send by the server, the balance of the payload.
Indeed, if there were only one algorithm necessary to create the wanted json structure to display the data from the raw data, i would have php processing the data into this object ready for javascript to deal with it without any additional treatment; but there are 2.
So
if I make php process the data to create 2 objects (one for table/one for chart), I will double the size of the json response & increase memory usage on the client side; i don't like this approach because this two object contain the same data, just structured differently & redundancy is evil, isn't it ?
if i send the raw object and let javascript search for what to display and where i'm giving lot of job to the client - this also at every subcriteria change (or i could create all the json objects once on ajaxsuccess so they are ready in case of this subcriteria change ?)- here i'm little worry for users with old browser/small ram...
(The raw json object untreated, depending on criteria vary between 3kb and 12kb, between 500 and 2000 records)
I'm failing to spot the best approach...
So for this single raw data to multiple structured objects job, would you have php (increasing response size and sending redundant data) or javascript (increasing javascript payload) processing the raw data ?
Thanks a ton for your opinion
I found an appropriate solution, so I will answer my own question.
I have followed #Daverandom's advice:
PHP sends raw data (along with a couple of parameters that depends on the combination of the main criteria)
JavaScript processes the raw data and render it in the page
JavaScript reprocesses the raw data if sub-criteria are changed, as upon testing the looping process appears to be very fast and doesn't freeze the browser whatsoever, so there is no need to keep the structured object in the scope
Aggressive caching headers are sent with the JSON AJAX response (those data never change - only new records are added every year) in case user re-consults data that has already been consulted: so raw data is not kept in the JavaScript scope if it is not being displayed
On top of that, the JSON strings echoed by php are cached on the server (because those data never change) so this reduces database queries and improves response time
The final code is neat, easy to maintain, and the application works flawlessly.
Thanks to #Daverandom for the help.
I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.
Okay so I have some weird-er questions about Memcache. The whole basic idea of my caching technique is to save data to be requested by my PHP script in Memcached server. The main issue me and my team faced is that sometimes saving large amounts of data can sometimes pass the 1MB limit for the item data size in Memcached.
To further explain the approach imagine the following:
We have lots of data to configure a certain object and that data contains a lot of text and numbers..etc. And we need to save almost 200 items of those objects so the first approach we went with is to cache the entire 200ish objects to one big item in Memcached. That item may surpass the limit of 1Mb so we figured we can go with a new approach.
The new approach we went with is that we break down the data configuring the object into smaller building blocks (and since we don't use all the data in the same page) we would then use the smaller building blocks to get exactly the amount of data that we would use in that particular page.
The question is as follows:
Does the GET speed change when you get bigger data? Or would the limitation on the amount of requests handled by Memcached server in parallel get in the way of the second approach because we would then use multi GET to get the multiple building blocks configuring the object?
I know this is a weird question but it's vital to the new approach that we're going with since it would determine the size of the building blocks that we will use and whether or not we will add data to it if we need to.
Edit 1:
Bear in mind that we can use the MULTIGET function with the second approach so we don't have to connect to Memecached and wait for a response for each bit of data that we're getting. So parallel requests will be used to get the multiple keys.
Without getting into the 'what the heck are you storing in memcache and why not use another solution (like a DB with a memory table storage engine)....
I'd say the cost of the multiple requests is indeed a concern--especially with memcached running on remote nodes/hosts. A single request for a large object is most likely overall faster--you still need the same amount of data transferred, but will not have the additional separate request overhead vs. the 200 pieces.
BTW... If you're using APC and you don't have many of these huge items, you can use it instead of memcache to do local user level memory caching--the max size is easily tweakable via the php config settings. You won't get the benefit of distibuted access/sharing across hosts, but it's fast and simple.