Currently within my application I have a setup where upon submission of data, PHP Processes this data using several functions and then places in the database, for example
The quick brown fox jumps over the lazy dog
Becomes an SEO Title of
the-quick-brown-fox-jumps-over-the-lazy-dog
This string is stored in my database under title_seo, but my question is as follows.
What is more important:
The size of the database for storing this extra parsed strings
Or the resources used converting them
Now when I say "the resources used converting them", I mean that if I was to remove the column from the database and then parse the general title every time I output the contents.
Obviously when parsing every time the content get's called each request then the PHP usage increase but the database size decreases.
What should I be more worried about ?
Neither of them.
In this case the computational cost is minimal. But storing the seo_title in your table could allow you to change the url of your article title to whatever you want.
So I would keep the title_seo in the db
Relatively speaking hard drive space is considered cheaper then processing time. Therefore only having to waste processing time converting the title to the SEO title one time, and storing both of them in the database is the best option.
I don't have much to add to #yes123's answer, but in the end, the whole idea is that you should look if you can store more data in the database to prevent making unwanted calculations, don't take it as a rule, but mostly I favor storing more Db data vs making more calculations.
In your case, the calculations to convert a string into a SEO string look quite simple, so it wouldn't matter, but sometimes, you have a table with a few things like prices, unit quantity, discount and so on..., it's better to calculate the price when adding the rows than having to calculate it everytime you want to display it.
Hope I can help!
This is a question that is often replied by "depending on your own needs". Any software must have a balance between computing power versus memory used.
However, as many people says these days, "disk space is cheap".
So, going back to square one, ask yourself if you're going to have lots of data, and where are you going to store it (your own server, Amazon S3, etc) but I'll go with the "store only once" option.
if your have millions of pages, seo_title in DB - is bad idea. Good one - is have it as cached value text only.
If your have less than 10 millons pages, but more that 1 mln, - db will maybe need have separate table/s for se title
If your have less records, - no difference at all.
PS. Am thinking of corresponding site size <-> visitors number.
Related
I do have a SQL database with about 20 columns containing percentage values as decimals, like0.096303533707682 for example.
On my website I need to get these values, multiply with 100 and round them up so that 0.096303533707682 will be shown as 10% when the page is opened by the user.
Now my question is : is it faster/cheaper to calculate the 10% in advance and save the value to the database, so there is nothing to calculate after the query or doesn't it make much sense or difference ?
Thanks for your help!
For the individual operation the way to know is: Test it and be aware that performance on both sides can vary between versions and configurations.
On the larger system-level approach mind the following:
If you transfer data from the database to PHP to then do calculation you probably have extra cost due to networking, thus using SQL and calculating there has benefits.
Logic can be put into the database, using virtual columns, views or stored procedures/functions, thus multiple applications can share the logic
However for performance under scale it is simpler to add a new PHP host in front of a database than adding an extra database host.
For this specific question you also have to mind:
If you have to do the calculations every time maybe you can do this already while storing he data, thus taking more disk space but saving calculation time
Depending on the amount of data those costs could be quite neglectable and you should rather put it where it makes logically sense. (did you measure and see any problem at all or are you doing premature optimization?) Is the calculation more like "data retrival" or "business logic"? - This is a subjective choice.
I am creating an app that stores new information every week consists of 10 X 12 digit integers for about millions of unique URLs. I need to extract information for particular week or for a particular week range for the given URL. I am going to use MySQL as a database.
Tip: To simplify, grouping the URLs by domain will reduce the amount of data to processed while querying.
I need advice about structuring a database for fast querying that takes optimal processing power and disk space.
Since no-one else has had a go, here's my advice.
To make a start, ignore 'fast querying that takes optimal processing power and disk space.' Looking for that at the start won't get you anywhere. Design and create a sensible database to meet your function requirements. Bung in random data until you've got approximately the volume you expect. Run queries against it and time them.
If your database is normalised properly, the disc space it takes will also be approximately minimised. Queries may be slow: use execution plans to see why they're slow, and add indexes to help their performance. Once you get acceptable performance, you're there.
The main point is a standard saying: don't optimise until you know you have a problem and you've measured it.
I'm working on a full text index system for a project of mine. As one part of the process of indexing pages it splits the data into a very, very large number of very small pieces.
I have gotten the size of the pieces to be as low as a constant 20-30 bytes, and it could be less, it is basically 2 8 byte integers and a float that make up the actual data.
Because of the scale I'm looking for and the number of pieces this creates I'm looking for an alternative to mysql which has shown significant issues at value sets well below my goal.
My current thinking is that a key-value store would be the best option for this and I have adjusted my code accordingly.
I have tried a number but for some reason they all seem to scale even less than mysql.
I'm looking to store on the order of hundreds of millions or billions or more key-value pairs so I need something that won't have a large performance degradation with size.
I have tried memcachedb, membase, and mongo and while they were all easy enough to set up, none of them scaled that well for me.
membase had the most issues due to the number of keys required and the limited memory available. Write speed is very important here as this is a very close to even workload, I write a thing once, then read it back a few times and store it for eventual update.
I don't need much performance on deletes and I would prefer something that can cluster well as I'm hoping to eventually have this able to scale across machines but it needs to work on a single machine for now.
I'm also hoping to make this project easy to deploy so an easy setup would be much better. The project is written in php so it needs to be easy accessed from php.
I don't need to have rows or other higher level abstractions, they are mostly useless in this case and I have already made the code from some of my other tests to get down to a key-value store and that seems to likely be the fastest as I only have 2 things that would be retrieved from a row keyed off a third so there is little additional work done to use a key-value store. Does anyone know any easy to use projects that can scale like this?
I am using this store to store individual sets of three numbers, (the sizes are based on how they were stored in mysql, that may not be true in other storage locations) 2 eight byte integers, one for the ID of the document and one for the ID of the word and a float representation of the proportion of the document that that word was (number of times the work appeared divided by the number of words in the document). The index for this data is the word id and the range the document id falls into, every time I need to retrieve this data it will be all of the results for a given word id. I currently turn the word id, the range, and a counter for that word/range combo each into binary representations of the numbers and concatenate them to form the key along with a 2 digit number to say what value for that key I am storing, the document id or the float value.
Performance measurement was somewhat subjective looking at the output from the processes putting data into or pulling data out of the storage and seeing how fast it was processing documents as well as rapidly refreshing my statistics counters that track more accurate statistics of how fast the system is working and looking at the differences when I was using each storage method.
You would need to provide some more data about what you really want to do...
depending on how you define fast large scale you have several options:
memcache
redis
voldemort
riak
and sooo on.. the list gets pretty big..
Edit 1:
Per this post comments I would say that you take a look to cassandra or voldemort. Cassandra isn't a simple KV storage per se since you can storage much more complex objects than just K -> V
if you care to check cassandra with PHP, take a look to phpcassa. but redis is also a good option if you set a replica.
Here's add a few products and ideas that weren't mentioned above:
OrientDB - this is a graph/document database, but you can use it to store very small "documents" - it is extremely fast, highly scalable, and optimized to handle vast amounts of records.
Berkeley DB - Berkeley DB is a key-value store used at the heart of a number of graph and document databases - supposedly has a SQLite-compatible API that works with PHP.
shmop - Shared memory operations might be one possible approach, if you're willing to do some dirty-work. If you records are small and have a fixed size, this might work for you - using a fixed record-size and padding with zeroes.
handlersocket - this has been in development for a long time, and I don't know how reliable it is. It basically lets you use MySQL at a "lower level", almost like a key/value-store. Because you're bypassing the query parser etc. it's much faster than MySQL in general.
If you have a fixed record-size, few writes and lots of reads, you may even consider reading/writing to/from a flat file. Likely nowhere near as fast as reading/writing to shared memory, but it may be worth considering. I suggest you weigh all the pros/cons specifically for your project's requirements, not only for products, but for any approach you can think of. Your requirements aren't exactly "mainstream", and the solution may not be as obvious as picking the right product.
I am currently working on a web-page in HTML5/Javascript, where the user can record the value of sliders changed in a period of time. For example, when the user click Record, a timer will start and every time the user move a slider, its value will be saved in an array. When the user stop the record and play, all sliders will then be 'played' as he recorded them. I store the value in an array, something like that:
array[3] : {timeStamp, idSlider, valueSlider}
The array can actually become pretty massive as the user can then record over the already recorded settings without losing the previous one, this allow the user to change multiple sliders for the same time stamp.
I now want to be able to save this array somewhere (on server side), so the user can come back to the website later on, and just load its recorded settings, but I am not sure of the best approach to do that.
I am thinking of a dataBase, but not sure if this will be a bit slow to save and load from the server, plus my DataBase capacity is pretty small (around 25 Mo on my OVH server). I am thinking of maybe an average of 800 entries to save.
Maybe in a file (XML ?), but then I have no idea how to save that on my server-side...
Any other idea is welcome as I am a bit stuck on this one.
Thanks & sorry for any english mistakes,
Cheers,
Mat
Either serialize() or json_encode() it, save it as a longtext database record.
http://us.php.net/serialize
http://us.php.net/json-encode
http://dev.mysql.com/doc/refman/5.0/en/blob.html
I think 800 is not as massive as you think. You could just save it to a database and transfer it with JSON. I believe the database should be pretty efficient and not waste a lot of storage, but this can depend on how you setup your tables. Don't use char() columns, for example.
To see which uses more or less storage you could calculate how much space 1000 entries would take, then put them in the database and see how much space it uses and how much it's wasting.
If you really were dealing with a very large number of items then network performance will probably become your first bottleneck. For loading I would then stream javascript to the browser so that the browser doesn't have to load the whole thing into memory. Also, for sending you would want to create manageable chunks of maybe 1000 at a time and send them in these chunks. But, this is assuming you're dealing with more data.
You can try storing the whole json object as plain text in user-specific files. That would make serving the settings really easy, since it would be plain json that simple needs to be evaluated.
It seems a bit unorthodox though. When you say 800 entries do you mean
1. id : 0, timestamp : XXXXXXXXXXXXX, idSlider : X, etc
2. id : 1, timestamp : XXXXXXXXXXXXX, idSlider....
Or 800 user entries? Because you could save the whole object in the database as well and save yourself from executing lots of "costy" queries.
If you are concerned more about storage size than read/write performance you could encode the array as json string, compress it using zlib library and save it as a file.
This is a perfect use case for MongoDB.
Concern about my page loading speed, I know there are a lot of factors that affect page loading time.
Does retrieving records (Categories) in a array instead of DB is faster?
Thanks
It is faster to keep it all in PHP till you have an absurd amount of records and you use up RAM.
BUT, both of these things are super fast. Selecting a handful of records on a single table that has an index should take less than a msec. Are you sure that you know the source of your web page slowness?
I would be a little bit cautious of having your Data in your code. It will make your system less maintainable. How will users change categories?
THis gets back to deciding if you want your site static versus dynamic.
Yes of course retrieving data from an array is much faster than retrieving data from a Database, but usually arrays and databases have totally different use cases, because data in an array is static (you type the value in code or in a separate file and you can't modify them) while data in a database is dynamic
Yes, it's probably faster to have an array of your categories directly in your PHP script, especially if you need all the categories on every page load. This makes it possible for APC to cache the array (if you have APC running), and also lessen the traffic to/from the database.
But is this where your bottleneck is? It seems to me as the categories should have been cached in the query cache and therefore be easily retrieved. If this is not your biggest bottleneck, chances are you won't see any decrease in loading times. Make sure to profile your application to find the large bottlenecks or you will waste your time on getting only small performance gains.
If you store categories in a database, you have to connect to the database, prepare a SQL statement, send it to the server, fetch the result set, and (probably) store the results in an array. (But you'll probably already have a connection to the database anyway, and hardware and software is designed to do this kind of work quickly.)
Storing and retrieving categories
from a database trades speed for
maintenance. Your data is always up
to date; it might take a little
longer to get it.
You can also store categories as constants or as literals in an array assignment. It would be smart to generate the constants or the array literals from data stored in the database, but you might not have to do that for every page load. If "categories" doesn't change much, you might be able to get away with generating the code once or twice a day, plus whenever someone adds a category. It depends on your application.
Storing and retrieving categories
from an array trades maintenance for
speed. Your data loads a little
faster; it might be incomplete.
The unsatisfying answer is that you're not going to be able to tell how different storage and page generation strategies affect page loading speed until you test them. And even testing isn't that easy, because the effect of changing server and database parameters can be, umm, surprising.
(You can also generate static pages from the database using php. I suggest you test some static pages to give you an idea of "best case" performance.)