I got a simple general question on wrinting and retrieving data with PHP, something I cannot grasp the concept of.
If we have a website with a large number of visitors and with often updates, how do visitors access the pages that are currently beeing updated, should there be a conflict?
I asked somewhat connected question here and got some good responses (timestap, flock etc.), but on a larger scale, how does this work. Particulary interested in XML, for example GetSimple CMS uses XML for a database, and it can accomodate relatevely large websites easily. But how can you proof all the data to not to be broken while you are writing in data files and user accessing them at the very same time, and there could be many users and many editors?
Want to get the idea and technics in general, not just XML?
I feel like the amount of time the physical files or data is actually being changed its very tiny, leaving a close to impossible condition of accessing the same data at the exact same time.
I would like to see someone accomplish a physical file written to and read from in the exact same millisecond...
Related
I've got a heavy-read website associated to a MySQL database. I also have some little "auxiliary" information (fits in an array of 30-40 elements as of now), hierarchically organized and yet gets periodically and slowly updated 4-5 times per year. It's not a configuration file though since this information is about the subject of the website and not about its functioning, but still kind of a configuration file. Until now, I just used a static PHP file containing an array of info, but now I need a way to update it via a backend CMS from my admin panel.
I thought of a simple CMS that allows the admin to create/edit/delete entries, periodical rare job, and then creates a static JSON file to be used by the page building scripts instead of pulling this information from the db.
The question is: given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
I just used a static PHP
This sounds like contradiction to me. Either static, or PHP.
given the heavy-read nature of the website, is it better to read a rarely updated JSON file on the server when building pages or just retrieve raw info from the database for every request?
Cache was invented for a reason :) Same with your case - it all depends on how often data changes vs how often is read. If data changes once a day and remains static for 100k downloads during the day, then not caching it or not serving from flat file would would simply be stupid. If data changes once a day and you have 20 reads per day average, then perhaps returning the data from code on each request would be less stupid, but from other hand, all these 19 requests could be served from cache anyway, so... If you can, serve from flat file.
Caching is your best option, Redis or Memcached are common excellent choices. For flat-file or database, it's hard to know because the SQL schema you're using, (as in, how many columns, what are the datatype definitions, how many foreign keys and indexes, etc.) you are using.
SQL is about relational data, if you have non-relational data, you don't really have a reason to use SQL. Most people are now switching to NoSQL databases to handle this since modifying SQL databases after the fact is a huge pain.
I am creating a web-based app for android and I came to the point of the account system. Previously I stored all data for a person inside a text file, located users/<name>.txt. Now thinking about doing it in a database (like you probably should), wouldn't that take longer to load since it has to look for the row where the name is equal to the input?
So, my question is, is it faster to read data from a text file, easy to open because it knows its location, or would it be faster to get the information from a database, although it would have to first scan line by line untill it reaches the one with the correct name?
I don't care about the safety, I know the first option is not save at all. It doesn't really matter in this case.
Thanks,
Merijn
In any question about performance, the first answer is usually: Try it out and see.
In your case, you are reading a file line-by-line to find a particular name. If you have only a few names, then the file is probably faster. With more lines, you could be reading for a while.
A database can optimize this using an index. Do note that the index will not have much effect until you have a fair amount of data (tens of thousands of bytes). The reason is that the database reads the records in units called data pages. So, it doesn't read one record at a time, it reads a page's worth of records. If you have hundreds of thousands of names, a database will be faster.
Perhaps the main performance advantage of a database is that after the first time you read the data, it will reside in the page cache. Subsequent access will use the cache and just read it from memory -- automatically, I might add, with no effort on your part.
The real advantage to a database is that it then gives you the flexibility to easily add more data, to log interactions, and to store other types of data the might be relevant to your application. On the narrow question of just searching for a particular name, if you have at most a few dozen, the file is probably fast enough. The database is more useful for a large volume of data and because it gives you additional capabilities.
Abit of googling came up with this question: https://dba.stackexchange.com/questions/23124/whats-better-faster-mysql-or-filesystem
I think the answer suits this one as well.
The file system is useful if you are looking for a particular file, as
operating systems maintain a sort of index. However, the contents of a
txt file won't be indexed, which is one of the main advantages of a
database. Another is understanding the relational model, so that data
doesn't need to be repeated over and over. Another is understanding
types. If you have a txt file, you'll need to parse numbers, dates,
etc.
So - the file system might work for you in some cases, but certainly
not all.
That's where database indexes come in.
You may wish to take a look at How does database indexing work? :)
It is quite a simple solution - use database.
Not because its faster or slower, but because it has mechanisms to prevent data loss or corruption.
A failed write to the text file can happen and you will lose a user profile info.
With database engine - its much more difficult to lose data like that.
EDIT:
Also, a big question - is this about server side or app side??
Because, for app side, realistically you wont have more than 100 users per smartphone... More likely you will have 1-5 users, who share the phone and thus need their own profiles, and for the majority - you will have a single user.
At the moment I am writing a series of functions for fetching Dota 2 matches from the Steam API. When someone fetches their games, I have to (for my use) take a history of all of their games (lets say 3 api calls), then all the details from each of those games (so if there's 200 games, another 200 api calls). This takes a long time, and so far I'm programming all of the above to be in one php file "FetchMatchHistory.php", which is run by the user clicking a button on the web page.
Another thing that is making me feel it should be in one file, is that I imagine it is probably good practice to put all of the information (In this case, match history, match details, id's etc.) into the database all at once, so that there doesn't have to be null values in the database?
My question is whether or not having a function that takes a very long time should be in just one PHP file (should meaning, is generally considered good practice), or whether I should break the seperate functions down into smaller files. This is very context dependent, I know, so please forgive me.
Is it common to have API calls spanning several PHP files if that is what you are making? Is there a security/reliability issue with having only one file doing all the leg-work (so to speak)?
Good practice is to have a number of relevant functions grouped together in a php file that describes them, for organize them better and also for caching reasons for the parts that get updated more slowly than other.
But speaking of performance, i doubt you'll get the performance improvements you seek by just moving code through files.
Personally i had the habit to put everything in a file, consistently:
making my files fat
hard-to-update
hard-to-read
hard to find the thing i want (Ctrl+F meltdown)
wasting bandwidth uploading parts they did not need to be updated
virtually disabling caching on server
I dont know if any of the above is of any use for your App, but breaking files into their relevant files/places did my life easier.
UPDATE:
About the database practice, you're going to query only the parts you want to be updated.
I dont understand why you split that logic in files, there's not going to give you performance. Instead, what is going to give you performance is to update only the relevant parts and having tables with relevant content. Speaking of multiple tables have a lot more sense, since you could use them as pointers to the large data contained in another tables, reducing the possible waste of data having just one table.
Also, dont forget a single table has limitations; I personally try to have as few columns as possible. Adding more and more and a day you can't add more because of the row limit. There is a maximum number of columns in general, but this limit rarely ever get maxed by developer; the increased per-row content itself is going to suck out that limit.
Whether to split server side code to multiple files or keep it in a single one is an organizational issue, more than a security/reliability one...
I don't think it's more secure to keep your code in separate source files.
It's entirely a of how you prefer to organize and mantain your code base.
Usually, I separate it when I can find some kind of "categories" in my code.
Obviously, if you write OO code, the most common choice is to keep each class in a single file...
I run a large website and I'm looking in to the advantages of using XML to retrieve the data rather than querying the database so much in the hope of speeding things up a little.
The problem is, I have lots of AJAX requests etc. that go on across the website.
Is there any great advantages in using XML over MySQL and how reliable is it, i.e. can I update the whole XML file on every update, or will that cause other users to not have access to the XML file for a few seconds while the PHP writes the new file... or should I use PHP to look for and update just that field in the XML document (although that would still require updating the whole file)!?
Any ideas and best-practice ideas would be great here. How do stackoverflow do it?
Using a database sounds ideal for the scenario you describe (many concurrent accesses, many small queries and updates, etc.)
Reading and writing an XML file is definitely not going to be faster - in fact, it's likely to be much, much slower. XML is not a choice you would make to improve performance.
If you are having performance problems, look at optimizing your database first.
Please don't do this.
Working with XML (file system) will never be faster than querying a database. Databases are used for a reason...
In answer to your last question (how do stack overflow do it). They use a database: https://data.stackexchange.com/ - Namely SQL Server 2008.
Short and simple - databases was created for getting past the painfully slow file based systems.
How do stackoverflow do it?
An article from the site founder: Back to basics
I'm struggling with a conceptual question. When you have a forum with thousands of posts and/or threads, how do you retrieve all those posts to be displayed on your site? Do you connect to your database every time someone visits your page then capture every post in an array and display it? Surely this seems like it would be very taxing on your server and would cause a whole bunch of unnecessary database reads. Can anyone shine some light on this topic?
Thanks.
You never retrieve all those posts at once. In most case, forums show a page of X threads/posts, and you just get those X threads/posts from the database each time a page is served. RDBMS are pretty good at this. A forum is (should be) quite dynamic so indeed it generates a pretty good load on the database, but this is what database are made for, storing and retrieving data.
One new(ish) way of doing this is to use a Document Oriented Database like CouchDB where everything about an individual post is stored in the same document and that document gets loaded on request.
It seems in this case a Document Oriented Database would work very well for a forum or blog type site.
As far as Relational Databases go, I'm pretty sure the database gets hit every time the page loads unless there is some sort of caching implemented (then you'd have to worry about data getting stale though, which brings up a whole new mess of problems.)
Don't worry a lot about stale data. Facebook doesn't... their database is only "eventually consistent". The idea is like this: making sure that the comments are 100% always, always up-to-date is very expensive. That does put a large load on your DB. Although as Serty says, that's what the DB is made for, but whether or not your physical box is sufficient for the load is another matter.
Facebook and Digg to name a few took a different approach... Is it really all that important that every load of every page be 100% accurate? How many page loads actually result in every single comment being read by the end user anyways? It's a lot cheaper to get the comments right 'most' of the time and by 'most' I mean something you get to decide. Is a 10% chance of a page with missing comments ok? is a 1% chance? How many nodes need to have the right data NOW. When I write a new comment, how many nodes have to say they got the update for it to be successful.
I like the idea behind Cassandra which is in summary, "how much are we willing to spend to get Aunt Martha's comment about her nephew's baptism picture 100% correct?"
But that's a fine question for a free website, but this wouldn't work so good for a business application.