Fastest way to store table of data and retrieve - php

I am storing some history information on my website for future retrieval of the users. So when they visit certain pages it will record the page that they visited, the time, and then store it under their user id for future additions/retrievals.
So my original plan was to store all of the data in an array, and then serialize/unserialize it on each retrieval and then store it back in a TEXT field in the database. The problem is: I don't know how efficient or inefficient this will get with large arrays of data if the user builds up a history of (e.g.) 10k pages.
EDIT: So I want to know what is the most efficient way to do this? I was also considering just inserting a new row in the database for each and every history, but then this would make a large database for selecting things from.
The question is what is faster/efficient, massive amount of rows in database or massive serialized array? Any other better solutions are obviously welcome. I will eventually be switching to Python, but for now this has to be done in PHP.

There is no benefit to storing the data as serialized arrays. Retrieving a big blob of data, de-serializing, modifying it and re-serializing to update is slow - and worse, will get slower the larger the piece of data (exactly what you're worried about).
Databases are specifically designed to handle large numbers of rows, so use them. You have no extra cost per insert as the data grows, unlike your proposed method, and you're still storing the same amount of data, so let the database do what it does best, and keep your code simple.
Storing the data as an array also makes any sort of querying and aggregation near impossible. If the purpose of the system is to (for example) see how many visits a particular page got, you would have to de-serialize every record, find all the matching pages, etc. If you have the data as a series of rows with user and page, it's a trivial SQL count query.
If, one day, you find that you have so many rows (10,000 is not a lot of rows) that you're starting to see performance issues, find ways to optimize it, perhaps through aggregation and de-normalization.

you can check for session variable and store all data of one session and can dump it together into database.
You can do Indexing at db level to save time.
Last and the most effective thing you can do is to do operation/manipulation on data and store it in separate table.And always select data from manuplated table.You can achieve this using cron job or some schedular.

Related

How can i get the best performance when dealing with large database?

I'm programming a browser game in which there are spells table, items table.. ect. Each table has thousands of rows. What i do to handle this is the following.
Upon login, i store the entire database in the user's session. That includes only the tables that are not going to be changed by the user's input. For example, the spells table contains only information about the spells. How much damage they deal, what level is required for the player to have that spell, ect. The user only reads that data, never writes to it.
Let's say that the user wants to buy a specific spell. I can't aford the PHP code to go and check each array in the session variable for the spell id. Instead ->
<?php
// Load all database spells
$stmt = $db->prepare("SELECT * FROM spells");
$stmt->execute();
$result = $stmt->fetchAll(\PDO::FETCH_UNIQUE|\PDO::FETCH_ASSOC);
$_SESSION["spells_db"] = $result;
?>
So, what happens is -> i store all database spells into this session variable. Using \PDO::FETCH_UNIQUE|\PDO::FETCH_ASSOC i change the spell array key to the spell ID. This way i already know the spell key.
If i ever need to search for a spell information by id, the id of the spell is also the key of the array row of that spell. So instead using in_array() to make the PHP search every single row of the array, to find which inner array contains the relevant spell ID, i can just tell it which row it is. This way i saved a lot of performance.
But on the other side, each individual user stores the entire database on his session. In time this will cause my website to have scalability issues. I know that it is better to store data in the session, instead making query every time to ask the database if something is changed. In my case, when something gets changed, first i change it in the session, then i change it in the database. And every time an user refreshes the page, session data is displayed. But talking about large data storage like storing the entire database, makes my head blow up. So, any advice on how to deal with this? Thank you for your time.
I suggest you test it first using the database. I suppose its MySQL. It can handle gigabytes of data and millions of rows in a table, fast. The important is indexing. Thousands of rows is not too much for MySQL (assuming you don't have huge rows with several varchar(5000) and such).
(Those keys you were saying should probably by the indexes in your database table, and I have a gut feeling those are your autoincrement primary keys, so they will be selected fast.)
PHP Session data must be stored somewhere too
If you left session storage to default, than the data is stored in a file on disk. That means disk write and those are slower then any modern database (even on SSD) because the databases would cache (into RAM) and optimize.
If you store sessions in RAM and you do have a lot of data, you will definitely run out of RAM.
If you store your session in the database... you know
KISS.
If you are updating both $_SESSION and the database table, that adds complexity, sluggishness, etc. And potential errors. And potential consistency issues.
Assuming that you are fetching one spell from the spells table, that will take about 1ms. And you can have multiple queries running simultaneously.
I suggest you use the database heavily without $_SESSION, time actions, then decide which need speeding up. Then adding indexes, etc might help. Or switching to $_SESSION might be warranted.
Don't get sucked into "premature optimization".
A bigger problem will occur if your game gets popular -- a single server will not suffice. But once you spread the game across multiple servers, $_SESSION becomes unusable -- it is limited to one server.

MySQL or JSON for data retrieval

So, I have situation and I need second opinion. I have database and it' s working great with all foreign keys, indexes and stuff, but, when I reach certain amount of visitors, around 700-800 co-current visitors, my server hits bottle neck and displays "Service temporarily unavailable." So, I had and idea, what if I pull data from JSON instead of database. I mean, I would still update database, but on each update I would regenerate JSON file and pull data from it to show on my homepage. That way I would not press my CPU to hard and I would be able to make some kind of cache on user-end.
What you are describing is caching.
Yes, it's a common optimization to avoid over-burdening your database with query load.
The idea is you store a copy of data you had fetched from the database, and you hold it in some form that is quick to access on the application end. You could store it in RAM, or in a JSON file. Some people operate a Memcached or Redis in-memory database as a shared resource, so your app can run many processes or threads that access the same copy of data in RAM.
It's typical that your app reads some given data many times for every single time it updates the data. The greater this ratio of reads to writes, the better the savings in terms of lightening the load on your database.
It can be tricky, however, to keep the data in cache in sync with the most recent changes in the database. In other words, how do all the cache copies know when they should re-fetch the data from the database?
There's an old joke about this:
There are only two hard things in Computer Science: cache invalidation and naming things.
— Phil Karlton
So after another few days of exploring and trying to get the right answer this is what I have done. I decided to create another table, instead of JSON, and put all data, that was suposed to go in JSON file, in the table.
WHY?
Number one reason is MySQL has ability to lock tables while they're being updated, JSON has not.
Number two is that I will downgrade from few dozens of queries to just one, simplest, query: SELECT * FROM table.
Number three is that I have better control over content this way.
Number four, while I was searching for answer I found out that some people had issues with JSON availability if a lot of co-current connections were making request for same JSON, I would never have a problem with availability.

PHP array VS MSQL table

I have a program that creates logs and these logs are used to calculate balances, trends, etc for each individual client. Currently, I store everything in separate MYSQL tables. I link all the logs to a specific client by joining the two tables. When I access a client, it pulls all the logs from the log_table and generates a report. The report varies depending on what filters are in place, mostly date and category specific.
My concern is the performance of my program as we accumulate more logs and clients. My intuition tells me to store the log information in the user_table in the form of a serialized array so only one query is used for the entire session. I can then take that log array and filter it using PHP where as before, it was filtered in a MYSQL query (using multiple methods, such as BETWEEN for dates and other comparisons).
My question is, do you think performance would be improved if I used serialized arrays to store the logs as opposed to using a MYSQL table to store each individual log? We are estimating about 500-1000 logs per client, with around 50000 clients (and growing).
It sounds like you don't understand what makes databases powerful. It's not about "storing data", it's about "storing data in a way that can be indexed, optimized, and filtered". You don't store serialized arrays, because the database can't do anything with that. All it sees is a single string without any structure that it can meaningfully work with. Using it that way voids the entire reason to even use a database.
Instead, figure out the schema for your array data, and then insert your data properly, with one field per dedicated table column so that you can actually use the database as a database, allowing it to optimize its storage, retrieval, and database algebra (selecting, joining and filtering).
Is serialized arrays in a db faster than native PHP? No, of course not. You've forced the database to act as a flat file with the extra dbms overhead.
Is using the database properly faster than native PHP? Usually, yes, by a lot.
Plus, and this part is important, it means that your database can live "anywhere", including on a faster machine next to your webserver, so that your database can return results in 0.1s, rather than PHP jacking 100% cpu to filter your data and preventing users of your website from getting page results because you blocked all the threads. In fact, for that very reason it makes absolutely no sense to keep this task in PHP, even if you're bad at implementing your schema and queries, forget to cache results and do subsequent searches inside of those cached results, forget to index the tables on columns for extremely fast retrieval, etc, etc.
PHP is not for doing all the heavy lifting. It should ask other things for the data it needs, and act as the glue between "a request comes in", "response base data is obtained" and "response is sent back to the client". It should start up, make the calls, generate the result, and die as fast as it can again.
It really depends on how you need to use the data. You might want to look into storing with mongo if you don't need to search that data. If you do, leave it in individual rows and create your indexes in a way that makes them look up fast.
If you have 10 billion rows, and need to look up 100 of them to do a calculation, it should still be fast if you have your indexes done right.
Now if you have 10 billion rows and you want to do a sum on 10,000 of them, it would probably be more efficient to save that total somewhere. Whenever a new row is added, removed or updated that would affect that total, you can change that total as well. Consider a bank, where all items in the ledger are stored in a table, but the balance is stored on the user account and is not calculated based on all the transactions every time the user wants to check his balance.

serialize/json_encode to one field or keep raw data in many fields of db?

Since five days i am thinking about how to store data in my new project. I've read a lot of articles about advantages and disadvantages of serializing or json_encoding, also about searching database with thousands of records. Here's the problem.
Consider i am making a game - i have thousands of locations and every location may have a some objects in it. The ammount of objects is limited but i guess may be 10-20 objects on each location. Each object has some properties (like perks), which sometimes needs to be checked, updated and so on, so i have to store it in db.
I see two options to do it:
simple database way, working with first database normalization form - store each object as a row in database and each property - in columns. I can easily retrieve data, connect it to specific location with just one id. The problem is that there may be (and there will be) a thousands of rows in db - all objects * ammount of location and searching it might be very expensive in time. If you multiply it by number of players searching database simultaneously it can kill DB.
second way is to serialize or json encode (or even implode somhow) all the objects in current loaction with all its properties. I guess each object may have 100 properties * 20 objects serialized might be not so small array of values. So 2000 (assoc key + int value) elements serialized and saved in one field for each location. There is a lot less database searching - just set id as primary key and search for it but later i have to deserialize all the data. It also might be expensive.
I know i dont put any code in here (there isnt any yet) so it is quite virtual question but i wonder if u ever checked what is better solution - to store large ammount of data in one filed but have it serialized or to have it scattered into db for multiple rows.
Hope you can share your experience:)
Kalreg.
Relational database searching is insanely fast. It's also incredibly flexible if you set it up right. So the encode process will be the most time costly factor. The benefit of JSON is server-client data transfer. Personally I'd use the tried-and-true Option 1 but look to cache as much data client-side as possible, e.g. HTML storage.
I also note you're using PHP. An AJAX approach with minimal page reloads is what you want, although I may be over-reading your tag.
You will want the first method.

Big joins or multiple fetches most efficent?

I understand that multiple variables are part of this equation like number of tables, number of columns, number of returned rows, used indexes etc. But if we speak overall
Is more efficient to run a query with multiple (say 5+) joins where most of the tables will contain rows with information corresponding to rows in the main table and the returned result would be in the 20.000 rows range. For the sake of argument let's say the first table would contain users with a creation date and it's on this date we decide the users to pick out. The other tables contain stuff such as session information, user notes etc. All users should be picked out but depending on the values of fields in the secondary tables we might ignore the session data for one user and do some work with the session data on another user when we go through the results. This way we would get all needed data in one query but might get some redundant data for some users at the same time.
Or would it be more efficient to pick the users by date and when iterating the results we fetch data from the other tables per user when it's necessary?
Let's say that the work on the returned rows is done within PHP5+.
I'll say, do a benchmark.
It will depends on the frequency of "when it's necessary". If you need the extra date for 10% of the users, the seconde approach will be better I think. If you need them for 90%, it will be better to retrieve everything in one big query.
Big join.
I can cite absolutely no evidence to back that up. I do speak from some experience, though: in the system i work with, we do millions of little tiny simple queries, rather than a few big ones, and all the data-intensive work takes ages. For example, it takes an hour to load data that a direct SQL load can do in a couple of minutes. The per-query cost completely dominates the equation.
If your tables have the proper indexes (which will help a lot, when it comes to joins), one single SQL query, even a bit complex, will probably be faster than several queries, which will each imply an exchange between PHP and the MySQL server.
(But, of course, the only way to know for sure what applies the best in your specific situation is to test both solutions, benchmarking them !)

Categories