I'm wondering what the best way is to prevent my server from being stressed when I want to get a lot of data using PHP.
Convert all my data to an XML sheet? Use caching?
The data comes from many different databases and different tables.
Any ideas?
Thanks in advance
Before you do anything, stress it until it breaks.
When it breaks, profile it to identify the breaking point, and work on that.
Repeat from point 1 until the performance limits are acceptable.
Neither.
Both add significant complexity within your PHP code. Converting to XML has absolutely no added value - indeed not only is it an additional cost encoding in XML, you also have a cost for decoding. As for actucally storing your data in XML - this is plain ridiculous.
The MySQL DBMS provides very effective result caching. The underlying OS should provide very good I/O caching. You gain nothing by adding your own caching layer except for slower code and hotter cpus.
If you have an OLTP type database and are routinely collating large sets of data, then you might want to think about pre-consolidating it or implementing an OLAP schema within your database.
OTOH if you simply want to improve the performance of your system, then look aleswhere - particularly your data scema.
C.
I think caching will be the way to go , except if this data doesn't change very often :)
Related
I'm working on a website that is supposed to show the specifications of computers, for example: Cpu, cpu speed, ram and stuff like that. There are a lot of fields and I wonder whether the php's mysql_fetch_array would be faster or saving all the data in one varchar field and separating it using php? also wondering if there are any pros and cons to either of them?
I'm using php 5.3 and mySQL
Thanks in advance.
If I were you I'll use A database but Everything in separates field and tables (DB relational) with a Entity relationship Diagram + normalization. Doing that you will have scalability, Independence, avoid redundancy, beside if you add good INDEX (PK,FK and UK) you could have a good performance. Of course that will also depend of your queries and your PHP code. But that's a good practice to implement the right DB design plus security stuff.
The pros are A LOT... the cons, It will take you more time but it will worth it.
I doubt the speed difference is ever going matter. What will matter is whether or not you have a regular structure and the ability to freely query your data. Put it in the database & figure out how to speed things up later when/if it ever becomes an issue, rather than making poor design decisions up front because you think you might need more performance.
should i take the new data for my ajax onlinegame worldmap (while dragging/scrolling) from my mysql db or is it better to load the data from a generated (and frequently updated) XML ? (frequently updated -> because of new players joining the game/worldmap)?
in other words:
is mysql capable of dunno a few thousand players scrolling a worldmap (and therefore requesting new data) or should i use a XML sheet?
Personally I hate XML.
For you it might be the right tool for the job, but I'm just going to answer the "is mysql capable of..." part of your question :-)
Yes
But it depends on your SQL skills.
How to speed things up?
Keep the MySQL server on the same machine as the webserver to avoid network traffic.
Use memory tables to avoid disk IO.
Know your way around SQL
MySQL in de default config is tuned to small tables and small memory sizes, this sounds like it fits your case, but experiment and measure to see which config works best.
Fewer selects/inserts/updates with more data per request are faster than more selects/inserts/updates with less data per request.
Also note that if you don't cache the XML file in memory you will hit lock issues on the XML file slowing things down.
A database hit will almost always be more expensive than a file hit (due to crossing a network) - but the quickest option would be to keep an in-memory dataset/cache (be aware of memory consumption though).
i think mysql fits your need.. you also could cluster your data, when running low on system resources…
also could websockets be intressting for you. maybe you should have look at nodejs, with it you can handle new user joins easily (push the new players to the other players instead of pulling the data out of mysql
Is the Ajax response returned as XML or JSON? If the latter, then why bother messing about with XML?
If it were me, I'd maintain the data in the database with smart serverside caching (where you can invalidate cache items selectively)
HI
I got a doubt I have seen reading mysql data is slower in case of large tables...I have done lots of optimization but cant get through..
what I am thinking is will it give a better speed if I store data in a piece of file??
off course each data will be a separate file. so millions of data = millions of file. I agree it will consume the disk space... but what about reading process?? is it faster??
I am using PHP to read file...
Reading one file = fast.
Reading many / big files = slow.
Reading singular small entries from database = waste of I/O.
Combining many entries within the database = faster than file accesses.
As long as your tables are properly indexed and as long as you are using those indices (that's right), using a relational DB (like mysql) is going to be much faster, more robust, flexible (insert many buzzwords here), etc.
To examine why your queries' performance does not match your expectations, you can use the explain clause with your selects (http://dev.mysql.com/doc/refman/5.1/en/explain.html).
To answer the topic, yes.
By which I mean that there are so many (unmentioned) factors that it's impossible to unequivocally state that one will be faster than the other every time.
It depends on what kind of data you're storing. Structured data is usually much faster and more flexible/powerful to read using SQL, since that's exactly what its made for. If you want to search, filter, sort or group by a certain attribute, the index structures and optimizations of a DBS are appropriate.
However, when using a DB for storing large files (BLOBs), which contain unstructured data in the sense that you are not going to search, filter, sort or group by any part of the files, then these files just blow up the database size and make it slow. There is an interesting study by Microsoft on this topic (just have to find the link yet). This study is the reason why Microsoft introduced the External BLOB storage in their SQLServer, which basically means what you asked: The BLOBs are saved in files outside the database, because they measured that access is much faster that way.
When storing files (e.g., pictures, videos, documents...) you often have some metadata on the file which you want to be able to use with a structured query language like SQL, while the actual files don't necessarily need to be saved in the database.
Reading from a dbms (MySQL is one) is faster in most cases, because they have built in cache that will keep the data in memory, so next time you try to read the same data, you will not have to wait on the incredible slow hard drive.
A dbms is essentially reading from your hard drive + a cache to speed things up (+ some data sorting algorithms). Remember, your database is stored on your hard drive :)
It depends on a lot of factors, not least of which is what kind of file system you're using. MySQL uses files for storage anyway, so read speed isn't the issue -- the biggest factor will be how fast MySQL can find your data, compared to how fast it can be looked up in your filesystem.
Generally, though, MySQL is quite good about finding data quickly -- after all, that's its purpose in life. So unless you have a really good reason why the FS should be much faster, stick with the DB and check your indexes and such.
By choosing a custom file storage system you will lose the benefits of using a relational database. Also your code might not be easy maintainable.
Nonetheless, there are many who believe that relational databases offer too much complexity at the cost of speed. Have a look at the NoSQL entry in wikipedia and read about possible alternatives.
For a very large site such as a Social Network (say Facebook), which method would you recommend for user accounts storage?
1) Single XML files for each type of features, on the user's directory: basicinfo.xml, comments.xml, photos.xml, ...
2) MySQL, although not sure how to organize on this. Maybe separated tables for each feature? E.g. a tables for Comments, where columns are id,from,message,time?
I know XML is not designed for storage, and PHP (this is the language I use) must read the entire XML file and store in memory before it is used.
But, here are the reasons why I prefer XML (but I may be wrong, please tell me if you disagree with any):
1) If I have user accounts' paths organized in this way
User ID 2342:
/users/00/00/00/00/00/00/00/23/42/
I think it's faster to find the Comments of a user by file path than seeking in a large database.
Also, if each feature is split in tables, each user profile will seek more than once, to display comments, photos, basic info, etc.
2) I heard MySQL is globaly locked when writing on it. Is this true? If yes, I rather to lock a single file than everything.
3) Is MySQL "shared" between the cluster? I mean, if 1 disk gets full, will it "continue" on another? Or do I, as the programmer, have to manage it myself and create new databases on another disk? (note, I use Linux)
It is ok that it is about the same by using XML files, but it is easier to split between disks, because structure is split by account IDs, not by feature as it would be in a database.
4) Note that I don't store each comment on the comments.xml. I just note their attributes in each XML tag, and the messages are in separated text files commentid.txt. Once each XML should not be much large, there should not be problems with memory/time.
As for the problem of parsing entire XML, maybe I should think on using XMLReader/Writer instead of SimpleXML/DOM? Or, will it decrease performance allot?
Thank you!
Facebook uses MySQL.
That being said. Here's the long version:
I always say that XML is a data transfer technology, not a data storage technology, but not everyone agrees. XML is not designed to be use a relational datastore. XML was first introduced in order to provide a standard way of transmitting data from system to system w/o giving access to the originating systems.
Since you are talking about a large application, I would strongly urge you to use MySQL (or other RDBMS), as your dataset grows and grows the XML will be increasingly slower and slower unless you always keep a fresh copy in memory and only read the XML files upon service reboot.
Using an XML database is reportedly more efficient in terms of conversion costs when you're constantly sending XML into and retrieving XML out of a database. The rationale is, when XML is the only transport syntax used to get things in and out of the DB, why squeeze everything through a layer of SQL abstraction and all those relational tables, foreign keys, and the like? It basically takes a parsing layer out of the application and brings it into the data engine - where it's probably going to work faster and more efficiently than the SQL alternative. Probably.
Depends heavily on the nature of your site. On the one hand the XML approach gives you a free pass on things like “SELECT * FROM $table where $table.id=$id” type queries. On the other hand...
For a very large site, in the worst case scenario the data files end up pretty big too. If it is any kind of community site this may easily happen for any account go to any forum with a true number of old-guard members in its community and you'll find a couple of posters that have say 10K posts... This means you will wish for SQL style result sets which are implemented using a memory efficient model, rather than a speed efficient one. To the end user 1s versus 1.1s response time is not that much of a deal; but to you 1K of simultaneous requests versus 1.5K or better definitely is.
Then there is the aspect that if you are mostly reading data XML may be fine if somewhat crude for large data sets and DOM based implementations. But if you are writing a lot, things become much much worse. Caching of data is still possible, but giving ACID like guarantees on these file transactions requires you to pretty much write your own database software.
And then there is storage requirements and such like which mean you may need a distributed approach for storing your data. These kind of setups are relatively well understood in the database world, and they bring a lot of interesting problems with them to the table (like what do you do if a single disk fails?, how do you know on what disk to find the data and how do you implement efficient caching?) that essentially amount to again writing your own mini-database software from scratch.
So for a very large site I think the hard technical requirements of performance at not too great a cost in terms of memory and also a certain reliability and not needing to reinvent 21 wheels at the same time means that your approach would not work that well. I think it is better suited to smallish read-only sites where you can afford to experiment with and pursue alternative routes, where you can easily make changes and roll them out across the entire site.
IME: An in-house application using a single XML file for persistence didn't stand up to use by a single user...
1) What you're suggesting is that an XML file system with a manager application... There are XML databases, and XML there's been increasing support for storing XML within RDBMS. You're looking at re-inventing the wheel...
That's besides the normalization that would come out of storing the data in a RDBMS, which would enforce referential integrity that XML will never do...
2) "Global locking" is without any contextual scope. No database I know of locks globally when writing; most support degrees of locking (table/row/etc, varies between vendors) for sake of retaining concurrency when directed to - not by default.
3) Without a database, data or actual users--being concerned about clustering is definitely premature optimization.
4) If the system crashes without having written the referential integrity to some sort of persistence that will survive the application being turned off, the data will be useless.
So I'm going to be working on a home made blog system in PHP and I was wondering which way of storing data is the fastest. I could go in the MySQL direction, or I could go with my own little way of doing it which is storing all of the information (encoded in JSON) in files.
Which way would be the fastest, MySQL or JSON files?
For a small, single user 'database', a file system would likely be quicker - as the size and complexity grows, a database server like MySQL or SQL Server is hard to beat.
I would definately choose a DB option (as you need to be able to search and index stuff). But that does not mean you need a fully realized separate DB service.
MySQL is definitely the more scalable solution.
But the downside is you need to set up and maintain a separate service.
On the other hand there are DBs that are file based and still give you access with standard SQL (SQLite SQLite.org) jumps to mind. You get the advantages of SQL but you do not need to maintain a separate service. The disadvantage is that they are not as scalable.
I would choose a MySQL database - simply because it's easier to manage.
JSON is not really a format for storage, it's for sending data to JavaScripts. If you want to store data in files look into XML or Serialized PHP (which I suspect is what you are after, rather than JSON).
Forgive me if this doesn't answer your question very directly, but since it is a homecooked blog system is it really worth spending time thinking about what storage backend right now is faster?
You're not going to be looking at 10,000 concurrent users from day 1, it doesn't sound like it will need to scale to any maningful degree in the foreseeable future.
Why not just stick with MySQL as a sensible choice rather than a fast one? If you really want some sense that you designed for speed maybe bolt sqlite on instead.
Since you are thinking you may not have the need for a complex relational structure, this might be a fun opportunity to try something more down the middle.
Check out CouchDB, it is a document-based, schema free database (yet still indexable). The database is made of documents that contain named fields (think key-value pairs).
Have fun....
Though I don't know for certain, it seems to me that a MySQL database would be a lot faster, especially as the amount of data gets larger and larger.
Also, using MySQL with PHP is super easy, especially if you use an abstraction class like ezSQL. ezSQL makes working with a database really simple and I think you'd be creating more unnecessary work for yourself by going the home-brewed JSON direction.
I've done both. I like files for very simple problems and databases for complicated problems.
For file solutions, note these problems as the number of files increases:
1) Much more disk space is used than you might expect, because even tiny files use up a whole block. Blocks are fairly large on filesystems which support large drives.
2) Most filesystems get very slow when the number of files in a directory gets very large. My solution to this (assuming the names of the files are reasonably spread out across the alphabet) is to create a directory consisting of the first two letters of the filename. Thus, the file, "animal.txt" would be found at an/animal.txt. This works surprisingly well. If your filenames are not reasonable well-distributed across the alphabet, use some sort of hashing function to create the directories. Sounds a little crazy, but this can work very, very well, and I've used it for very fast solutions with tens of thousands of files.
But the file solutions really only fit sometimes. Unless you have a great reason to go with files, use a database.
This is really cool. It's a PHP class that controls a flat-file database with queries http://www.fsql.org/index.php
For blogs, I recommend caching the pages because blogs usually only have static content. This way, the queries only get run once while caching. You can update the cached pages when a new blog post is added.