I am currently working on a web-page in HTML5/Javascript, where the user can record the value of sliders changed in a period of time. For example, when the user click Record, a timer will start and every time the user move a slider, its value will be saved in an array. When the user stop the record and play, all sliders will then be 'played' as he recorded them. I store the value in an array, something like that:
array[3] : {timeStamp, idSlider, valueSlider}
The array can actually become pretty massive as the user can then record over the already recorded settings without losing the previous one, this allow the user to change multiple sliders for the same time stamp.
I now want to be able to save this array somewhere (on server side), so the user can come back to the website later on, and just load its recorded settings, but I am not sure of the best approach to do that.
I am thinking of a dataBase, but not sure if this will be a bit slow to save and load from the server, plus my DataBase capacity is pretty small (around 25 Mo on my OVH server). I am thinking of maybe an average of 800 entries to save.
Maybe in a file (XML ?), but then I have no idea how to save that on my server-side...
Any other idea is welcome as I am a bit stuck on this one.
Thanks & sorry for any english mistakes,
Cheers,
Mat
Either serialize() or json_encode() it, save it as a longtext database record.
http://us.php.net/serialize
http://us.php.net/json-encode
http://dev.mysql.com/doc/refman/5.0/en/blob.html
I think 800 is not as massive as you think. You could just save it to a database and transfer it with JSON. I believe the database should be pretty efficient and not waste a lot of storage, but this can depend on how you setup your tables. Don't use char() columns, for example.
To see which uses more or less storage you could calculate how much space 1000 entries would take, then put them in the database and see how much space it uses and how much it's wasting.
If you really were dealing with a very large number of items then network performance will probably become your first bottleneck. For loading I would then stream javascript to the browser so that the browser doesn't have to load the whole thing into memory. Also, for sending you would want to create manageable chunks of maybe 1000 at a time and send them in these chunks. But, this is assuming you're dealing with more data.
You can try storing the whole json object as plain text in user-specific files. That would make serving the settings really easy, since it would be plain json that simple needs to be evaluated.
It seems a bit unorthodox though. When you say 800 entries do you mean
1. id : 0, timestamp : XXXXXXXXXXXXX, idSlider : X, etc
2. id : 1, timestamp : XXXXXXXXXXXXX, idSlider....
Or 800 user entries? Because you could save the whole object in the database as well and save yourself from executing lots of "costy" queries.
If you are concerned more about storage size than read/write performance you could encode the array as json string, compress it using zlib library and save it as a file.
This is a perfect use case for MongoDB.
Related
So, I have situation and I need second opinion. I have database and it' s working great with all foreign keys, indexes and stuff, but, when I reach certain amount of visitors, around 700-800 co-current visitors, my server hits bottle neck and displays "Service temporarily unavailable." So, I had and idea, what if I pull data from JSON instead of database. I mean, I would still update database, but on each update I would regenerate JSON file and pull data from it to show on my homepage. That way I would not press my CPU to hard and I would be able to make some kind of cache on user-end.
What you are describing is caching.
Yes, it's a common optimization to avoid over-burdening your database with query load.
The idea is you store a copy of data you had fetched from the database, and you hold it in some form that is quick to access on the application end. You could store it in RAM, or in a JSON file. Some people operate a Memcached or Redis in-memory database as a shared resource, so your app can run many processes or threads that access the same copy of data in RAM.
It's typical that your app reads some given data many times for every single time it updates the data. The greater this ratio of reads to writes, the better the savings in terms of lightening the load on your database.
It can be tricky, however, to keep the data in cache in sync with the most recent changes in the database. In other words, how do all the cache copies know when they should re-fetch the data from the database?
There's an old joke about this:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
So after another few days of exploring and trying to get the right answer this is what I have done. I decided to create another table, instead of JSON, and put all data, that was suposed to go in JSON file, in the table.
WHY?
Number one reason is MySQL has ability to lock tables while they're being updated, JSON has not.
Number two is that I will downgrade from few dozens of queries to just one, simplest, query: SELECT * FROM table.
Number three is that I have better control over content this way.
Number four, while I was searching for answer I found out that some people had issues with JSON availability if a lot of co-current connections were making request for same JSON, I would never have a problem with availability.
I always was sure it is better and faster to use flat files to store realtime visit/click counter data: open file in append mode, lock it, put data and then close. Then read this file by crontab once in a five minutes, store contents to DB and truncate file for new data.
But today my friend told me, that it is a wrong way. It will better to have a permanent MySql connection and write data right to DB on every click. First, DB can store results to memory table. Second, even we store to a table located on disk, then this file is permanently opened by it, so no need to find it on disk and open again and again on every query.
What do you think about it?
UPD: We talking about high-traffic sites, about million per day.
Your friend is right. Write to a file and then a cronjob sending to database every 5 minutes? That sounds very convoluted. I can't imagine a good reason for not writing directly to DB.
Also, when you write to a file in the way you described, the operations are serialized. A user will have to wait for the other one to release the lock before writing. That simply won't scale if you ever need it. The same will happen with a DB if you always write to the same row, but you can have multiple rows for the same value, write to a random one and sum them when you need the total.
It doesn't make much sense to use a memory table in this case. If your data doesn't need to be persisted, it's much simpler to use a memcache you probably already have somewhere and simply increment the value for the key.
If you use a database WITHOUT transactions, you will get the same underlying performance as using files with more reliability and less coding.
It could be true that writing to a database is heavy - e.g. the DB could be on a different server so you have network traffic, or it could be a transactional DB in which case every write has at least 2 writes (potentially more if indexes are involved), but if you're aware of all this stuff then you can use a DB, take advantage of decades of work by others and make your programming task easy.
I have a PHP application that takes objects from users, and stores them on a server. Similar to an upload-download service. The object could vary from just a string of text,to any kind of media object like a movie, songs etc. Even if a simple text is sent, the size of the text sent could be big (probably an entire ebook). At present, the way I'm doing this is, write all these data to files, because files don't impose a size limit.
Edit: To clarify, I'm looking for a generic and efficient way for storing data, no matter what the format. For example, a user could send the string, "Hi, I am XYZ". I can store this using file operations like "fopen", "fwrite". If a user sends an MP3 file, I can again use "fwrite" and the data of the file will be written as is, and the MP3 format is not disturbed. This works perfectly at present, no issues. So "fwrite" is my generic interface here.
My question: Is there some better, efficient way to do this?
Thanking you for your help!
The answer to this question is rather complicated. You can definitely store such objects in the databases as LONGBLOB objects -- unless you are getting into the realm of feature length movies (the size limit is 32 bits).
A more important question is how you are getting the objects back to the user. A "blob" object is not going to give you much flexibility in returning the results (it comes from a query in a single row). Reading from a file system might give more flexibility (such as retrieving part of a text file to see the contents).
Another issue is backup and recovery. Storing the objects in the database will greatly increase the database size, requiring that much more time to restore a database. In the event of a problem, you might be happy to have the database back (users can see what objets they have) before they can actually access the objects.
On the other hand, it might be convenient to have a single image of the database and objects, for moving around to, say, a backup server. Or, if you are using replication to keep multiple versions in sync, storing everything in the database gives this for free (assuming you have a high bandwidth connection between the two servers).
Personally, I tend to think that such objects are better suited for the file system. That does require a somewhat more complicated API for the application, which has to retrieve data from two different places.
Storing files in a file system is not the bad way until your file system has no limits on files per directory count, on the file size. It can be also hard to sync it over a number of your servers.
In that case of limitations you can use some kind of virtual fs (like mongo gridFS)
I just joined a project, and have been going over the code. We need to export a lot of data out to Excel for our internal users. There are roughly 5 people who would have access to this functionality at a given time. In order to output to Excel, here's what I found:
retrieve data from DB, store in $_SESSION
show HTML page view of data
when the user wants to export
retrieve the DB data from $_SESSION
create a string in memory of a CSV
print the HTTP Headers with Excel as the filetype
print out the CSV formatted strings
This storage in $_SESSION is happening even when the user is not going to export. This strikes me as terribly inefficient, since the $_SESSION variable could explode in size, since each of the DB table retrievals can be up to 30MB per table, and the expiration on $_SESSION is set to 24 hours. So potentially, there could up to 5 users in the system, with up to 150MB $_SESSION variables. Sound about right?
Has anyone else ever seen something like this? Is this like a bootleg Memcache? Wouldn't it be best to just write the data to a flat-file that's updated once every few hours?
I do store database some data in session, like ID or small object that I use on every page.
But when it come to larger dataset that I can't extract on the fly for each page load, I often prefer to store them in a MEMORY/HEAP table ( or a temporary file ), and just store an ID in the session so I'll be able to extract them easily.
You might want to take a look at this question about session max size:
Maximum size of a PHP session
I have seen this as well and it is a bad idea. Sometimes you want to display on screen a table of data but also make it available for export, but there is no good reason for stuffing into session memory. If the OS needs to swap and the session gets written to file then you have file IO speed issues, so in some cases it is likely slower than a fresh query to the database.
$_SESSION in your example is being used to store data which is not needed in order to ensure consistency across page views, so this is pointless. It's there to store stuff like last_page_viewed, not to duplicate the DB itself. The only reason to do it like that is if the DB calls to get the data are so hideously expensive that even with the storage inefficiency you describe, it improves performance. This is very unlikely and it sounds like lazy coding.
If they want to export, there should be a zip function that reads all the data using the same SQL and packages it into an excel file on demand. Ideally using MVC so that the same code can be fed into the HTML or the zipper function ;)
Your solution could work if your database not update frequently otherwise your users may get outdated data. (And I don't think it's worth to store in the session data anyway.)
As you explained here, I think you are going to use this in a LAN and you don't have more than 5 concurrent users. If I'm right about that why don't you just read database straightly from the database and show it on HTML(I guess you can use paging and don't want to show all 30MB data in a single HTML page) again Export all the data to Excel straight from the DB when user request it :)
Currently within my application I have a setup where upon submission of data, PHP Processes this data using several functions and then places in the database, for example
The quick brown fox jumps over the lazy dog
Becomes an SEO Title of
the-quick-brown-fox-jumps-over-the-lazy-dog
This string is stored in my database under title_seo, but my question is as follows.
What is more important:
The size of the database for storing this extra parsed strings
Or the resources used converting them
Now when I say "the resources used converting them", I mean that if I was to remove the column from the database and then parse the general title every time I output the contents.
Obviously when parsing every time the content get's called each request then the PHP usage increase but the database size decreases.
What should I be more worried about ?
Neither of them.
In this case the computational cost is minimal. But storing the seo_title in your table could allow you to change the url of your article title to whatever you want.
So I would keep the title_seo in the db
Relatively speaking hard drive space is considered cheaper then processing time. Therefore only having to waste processing time converting the title to the SEO title one time, and storing both of them in the database is the best option.
I don't have much to add to #yes123's answer, but in the end, the whole idea is that you should look if you can store more data in the database to prevent making unwanted calculations, don't take it as a rule, but mostly I favor storing more Db data vs making more calculations.
In your case, the calculations to convert a string into a SEO string look quite simple, so it wouldn't matter, but sometimes, you have a table with a few things like prices, unit quantity, discount and so on..., it's better to calculate the price when adding the rows than having to calculate it everytime you want to display it.
Hope I can help!
This is a question that is often replied by "depending on your own needs". Any software must have a balance between computing power versus memory used.
However, as many people says these days, "disk space is cheap".
So, going back to square one, ask yourself if you're going to have lots of data, and where are you going to store it (your own server, Amazon S3, etc) but I'll go with the "store only once" option.
if your have millions of pages, seo_title in DB - is bad idea. Good one - is have it as cached value text only.
If your have less than 10 millons pages, but more that 1 mln, - db will maybe need have separate table/s for se title
If your have less records, - no difference at all.
PS. Am thinking of corresponding site size <-> visitors number.