large dataset for parsing in webpage - php

I have a large dataset of around 600,000 values that need to be compared, swapped, etc. on the fly for a web app. The entire data must be loaded since some calculations will require skipping values, comparing out of order, and so on.
However, each value is only 1 byte
I considered loading it as a giant JSON array, but this page makes me think that might not work dependably: http://www.ziggytech.net/technology/web-development/how-big-is-too-big-for-json/
At the same time, forcing the server to load it all for every request to be a waste of server resources since the clients can do the number crunching just as easily.
So I guess my question is this:
1) Is this possible to do reliably in jQuery/Javascript, and if so how?
2) If jQuery/Javascript is not the better option, what would be the best way to do this in PHP (read in files vs. giant arrays via include?)
Thanks!

I know Apache Cordova can make sql queries.
http://docs.phonegap.com/en/2.7.0/cordova_storage_storage.md.html#Storage
I know it's PhoneGap but it works on desktop browsers (At least all the ones I've used for phone app development)
So my suggestion:
Mirror your database in each users' local Cordova database, then run all the sql queries you want!
Some tips:
-Transfer data from your server to the webapp via JSON
-Break the data requests down into a few parts. That way you can easily provide a progress bar instead of waiting for the entire database to download
-Create a table with one entry that keeps the current version of your database, check this table before you send all that data. And change it each time you want to 'force' an update. This keeps the users database up-to-date and lowers bandwidth
If you need a push in the right direction I have done this before.

Related

Is there a more efficient way to update a JSON file?

I'm developing a browser-based game, and for the combat instances I need to be able track the player's hit points as well as the NPC's hit points. I'm thinking setting up a JSON file for each instance makes more sense then having a mySQL db get hammered with requests constantly. I've managed to create the JSON file, pull the contents, update the relevant vars, then overwrite the file, but I'm wondering if there's a more efficient way to handling it than how I've set it up.
$new_data = array(
"id"=>"$id",
"master_id"=>"$master_id",
"leader"=>"$leader",
"group"=>"$group",
"ship_1"=>"$ship_1",
"ship_2"=>"$ship_2",
"ship_3"=>"$ship_3",
"date_start"=>"$date_start",
"date_end"=>"$date_end",
"public_private"=>"$public_private",
"passcode"=>"$passcode",
"npc_1"=>"$npc_1",
"npc_1_armor"=>"$npc_1_armor",
"npc_1_shields"=>"$npc_1_shields",
"npc_2"=>"$npc_2",
"npc_2_armor"=>"$npc_2_armor",
"npc_2_shields"=>"$npc_2_shields",
"npc_3"=>"$npc_3",
"npc_3_armor"=>"$npc_3_armor",
"npc_3_shields"=>"$npc_3_shields",
"npc_4"=>"$npc_4",
"npc_4_armor"=>"$npc_4_armor",
"npc_4_shields"=>"$npc_4_shields",
"npc_5"=>"$npc_5",
"npc_5_armor"=>"$npc_5_armor",
"npc_5_shields"=>"$npc_5_shields",
"ship_turn"=>"$ship_turn",
"status"=>"$status");
$new_data = json_encode($new_data);
$file = "$id.json";
file_put_contents($file, $new_data);
It works, but I'm wondering if there is a way to update a single array item w/o having to pull ALL the data out, assign it to vars, and re-write the file. in this example, I'm only changing one var (ship_turn)
I'm thinking setting up a JSON file for each instance makes more sense then having a mySQL db get hammered with requests constantly.
MySQL is optimized for this task.
If you use files (like JSON) as database replacement, then you have to deal with "race conditions", because file access is not optimized for concurrent read / write access (by default).
If you're in a high-concurrency environment you should avoid using the filesystem as "database". Multiple operations on the file system are very hard to make atomic in PHP.
See flock for more details.
It depends on the game. For a turn based game or any non real-time game a MySQL approach should be ok. After all, databases are designed to get hammered heavily :-) For realtime games I would go for WebSocket and NodeJS as the backend. The server would keep a runtime state of the game, reacting appropriately to the client requests and dealing with race conditions (as you would do on a stand alone multiplayer server)

Handling big arrays in PHP

The application i am working on needs to obtain dataset of around 10mb maximum two times a hour. We use that dataset to display paginated results on the site also simple search by one of the object properties should also be possible.
Currently we are thinking about 2 different ways to implement this
1.) Store the json dataset in the database or a file in the file system, read that and loop over to display results whenever we need.
2.) Store the json dataset in relational MySQL table and query the results and loop over whenever we need to display them.
Replacing/Refreshing the results has to be done multiple times per hour as i said.
Both ways have cons. I am trying to choose a good way which is less evil overall. Reading 10 MB in memory is not a lot and on the other hand rewriting a table few times a hour could produce conflicts in my opinion.
My concern regarding 1.) is how safe the app will be if we read 10mb in the memory all the time? What will happen if multiple users do this at some point of time, is this something to worry about or PHP is able to handle this in background?
What do you think it will be best for this use case?
Thanks!
When php runs on a web server (as it usually does) the server starts new php processes on demand when they're needed to handle concurrent requests. A powerful web server may allow fifty or so php processes. If each of them is handling this large data set, you'll need to have enough RAM for fifty copies. And, you'll need to load that data somehow for each new request. Reading 10mb from a file is not an overwhelming burden unless you have some sort of parsing to do. But it is a burden.
As it starts to handle each request, php offers a clean context to the programming environment. php is not good at maintaining in-RAM context from one request to the next. You may be able to figure out how to do it, but it's a dodgy solution. If you're running on a server that's shared with other web applications -- especially applications you don't trust -- you should not attempt to do this; the other applications will have access to your in-RAM data.
You can control the concurrent processes with Apache or nginx configuration settings, and restrict it to five or ten copies of php. But if you have a lot of incoming requests, those requests get serialized and they will slow down.
Will this application need to scale up? Will you eventually need a pool of web servers to handle all your requests? If so, the in-RAM solution looks worse.
Does your json data look like a big array of objects? Do most of the objects in that array have the same elements as each other? If so, that's conformable to a SQL table? You can make a table in which the columns correspond to the elements of your object. Then you can use SQL to avoid touching every row -- every element of each array -- every time you display or update data.
(The same sort of logic applies to Mongo, Redis, and other ways of storing your data.)

PHP - MySQL call or JSON static file for unfrequently updated information

I've got a heavy-read website associated to a MySQL database. I also have some little "auxiliary" information (fits in an array of 30-40 elements as of now), hierarchically organized and yet gets periodically and slowly updated 4-5 times per year. It's not a configuration file though since this information is about the subject of the website and not about its functioning, but still kind of a configuration file. Until now, I just used a static PHP file containing an array of info, but now I need a way to update it via a backend CMS from my admin panel.
I thought of a simple CMS that allows the admin to create/edit/delete entries, periodical rare job, and then creates a static JSON file to be used by the page building scripts instead of pulling this information from the db.
The question is: given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
I just used a static PHP
This sounds like contradiction to me. Either static, or PHP.
given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
Cache was invented for a reason :) Same with your case - it all depends on how often data changes vs how often is read. If data changes once a day and remains static for 100k downloads during the day, then not caching it or not serving from flat file would would simply be stupid. If data changes once a day and you have 20 reads per day average, then perhaps returning the data from code on each request would be less stupid, but from other hand, all these 19 requests could be served from cache anyway, so... If you can, serve from flat file.
Caching is your best option, Redis or Memcached are common excellent choices. For flat-file or database, it's hard to know because the SQL schema you're using, (as in, how many columns, what are the datatype definitions, how many foreign keys and indexes, etc.) you are using.
SQL is about relational data, if you have non-relational data, you don't really have a reason to use SQL. Most people are now switching to NoSQL databases to handle this since modifying SQL databases after the fact is a huge pain.

Loading large XML file from url into MySQL

I need to load XML data from an external server/url into my MySQL database, using PHP.
I don't need to save the XML file itself anywhere, unless this is easier/faster.
The problem is, I will need this to run every hour or so as the data will be constantly updated, therefore I need to replace the data in my database too. The XML file is usually around 350mb.
The data in the MySQL table needs to be searchable - I will know the structure of the XML so can create the table to suit first.
I guess there are a few parts to this question:
What's the best way to automate this whole process to run every hour?
Whats the best(fastest?) way of downloading/ parsing the xml (~350mb) from the url? in a way that I can -
load it into a mysql table of my own, maintaining columns/ structure
1) A PHP script can keep running in background all the time, but this is not the best scenario or you can set a php -q /dir/to/php.php using cronos (if running on linux) or other techniques to makes server help you. (You still need access to server)
2) You can use several systems, the more linear one, less RAM consuming, is if you decide to work with files or with a modified mySQL access is opening your TCP connection, streaming smaller packages (16KB will be ok) and streaming them out on disk or another connection.
3) Moving so huge data is not difficult, but storing them in mySQL is not waste. Performing search in it is even worst. Updating it is trying to kill mySQL system.
Suggestions:
From what i can see, you are trying to synchronize or back-up data from another server. If there is just one file then make a local .xml using PHP and you are done. If there are more than one i will still suggest to make local files as most probably you are working with unstructured data: they are not for mySQL. If you work with hundreds of files and you need to search them fast perform statistics and much much more... consider to change approach and read about hadoop.
MySQL BLOOB or TEXT columns still not support more than 65KB, maybe you know another technique, but i never heard about it and I will never suggest to do so. If you are trying it just to use SQL SEARCH commands you took the wrong path.

Optimizing performance: Google map via PHP, mySQL, and Javascript

Based on this tutorial I have built a page which functions correctly. I added a couple of dropdown boxes to the page, and based on this snippet, have been able to filter the results accordingly. So, in practice everything is working as it should. However, my question is regarding the efficiency of the proceedure. Right now, the process looks something like this:
1.) Users visits page
2.) Body onload() is called
3.) Javascript calls a PHP script, which queries the database (based on criteria passed along via the URL) and exports that query to an XML file.
4.) The XML file is then parsed via javascript on the users local machine.
For any one search there could be several thousand results (and thus, several thousand markers to place on the map). As you might have guessed, it takes a long time to place all of the markers. I have some ideas to speed it up, but wanted to touch base with experienced users to verify that my logic is sound. I'm open to any suggestions!
Idea #1: Is there a way (and would it speed things up?) to run the query once, generating an XML file via PHP which contained all possible results, store the XML file locally, then do the filtering via javascript?
Idea #2: Create a cron job on the server to export the XML file to a known location. Instead of using "Gdownloadurl(phpfile.php," I would use gdownloadurl(xmlfile.xml). Thus eliminating the need to run a new query every time the user changes the value of a drop down box
Idea #3: Instead of passing criteria back to the php file (via the URL) should I just be filtering the results via javascript before placing the marker on the map?
I have seen a lot of webpages that place tons and tons of markers on a google map and it doesn't take nearly as long as my application. What's the standard practice in a situation like this?
Thanks!
Edit: There may be a flaw in my logic: If I were to export all results to an XML file, how (other than javascript) could I then filter those results?
Your logic is sound, however, I probably wouldn't do the filtering in Javascript. If the user's computer is not very fast, then performance will be adversely affected. It is better to perform the filtering server side based on a cached resource (xml in your case).
The database is probably the biggest bottleneck in this operation, so caching the result would most likely speed your application up significantly. You might also consider you have setup your keys correctly to make your query as fast as possible.

Categories