Large PHP arrays or MySQL temporary memory tables?

Large PHP arrays or MySQL temporary memory tables? - php

How would you temporarily store several thousands of key => value or key => array pairs within a single process. Lookups on key will be done continuously within the process, and the data is discarded when the process ends.
Should i use arrays? temporary MySQL tables? Or something in between?

It depends on how many several thousands mean and how big the array gets in the memory. If you can handle it in PHP, you should do it, because the usage of mysql creates a little overhead here.
But if you are on a shared host, or you have limited memory_limit in the php.ini and can't increase it you can use a temporary table in MySQL.
Also you can use some simple and fast key value storage like Memcached or Redis, they can also work in Memory only, and have a real fast lookup of keys (Redis promises Time Complexity of O(1))

Several thousand?! You mean it could take up several KILObytes?!
Are you sure this is going to be an issue? Before optimizing, write the code the simplest, straightforward way, and check later what really needs optimalization. Also, only having the benchmark and the full code will you be able to decide on the proper way of caching. Everything else is a waste of time and the root of all evil...

Memcached is a popular way of caching data.
If you're only running that one process and don't need to worry about concurrent access, I would do it inside php. If you have multiple processes I would use some established solution so you don't have to worry about the details.

It all depends on your application and your hardware. My bet, is to let databases do (especially MySQL) just Databases' work. I mean, not to much work than store and retrieve data. Other DBMS may be real efficient (Informix, for example) but sadly, MySQL is not.
Temporary tables may be more efficient than PHP arrays, but you increase the number of connections tu the DB.
Scalability is an issue too. Doing it in PHP is better in that way.

It is kind of difficult to give a straight answer if we don't get the complete picture.
It depens where you source data is.
If your data is in the database, you better keep it there and manipulate it there and just get the items you need. Use temp tables if necessarily
If you data is already in PHP you probably better keep in there. Although handling data in PHP is quite intensive

If the data lookup will be done with only few queries do it with mysql temporary table.
If there will be many data lookups its almost always best to store it in php side. (connection overhead)

Related

mysql_data_seek versus storing data in array

I have searched for a few hours already but have found nothing on the subject.
I am developing a website that depends on a query to define the elements that must be loaded on the page. But to organize the data, I must repass the result of this query 4 times.
At first try, I started using mysql_data_seek so I could repass the query, but I started losing performance. Due to this, I tried exchanging the mysql_data_seek for putting the data in an array and running a foreach loop.
The performance didn't improve in any way I could measure, so I started wondering which is, in fact, the best option. Building a rather big data array ou executing multiple times the mysql_fetch_array.
My application is currently running with PHP 5.2.17, MySQL, and everything is in a localhost. Unfortunatly, I have a busy database, but never have had any problems with the number of connections to it.
Is there some preferable way to execute this task? Is there any other option besides mysql_data_seek or the big array data? Has anyone some information regarding benchmarking testes of these options?
Thank you very much for your time.

The answer to your problem may lie in indexing appropriate fields in your database, most databases also cache frequently served queries but they do tend to discard them once the table they go over is altered. (which makes sense)
So you could trust in your database to do what it does well: query for and retrieve data and help it by making sure there's little contention on the table and/or placing appropriate indexes. This in turn can however alter the performance of writes which may not be unimportant in your case, only you really can judge that. (indexes have to be calculated and kept).
The PHP extension you use will play a part as well, if speed is of the essence: 'upgrade' to mysqli or pdo and do a ->fetch_all(), since it will cut down on communication between php process and the database server. The only reason against this would be if the amount of data you query is so enormous that it halts or bogs down your php/webserver processes or even your whole server by forcing it into swap.
The table type you use can be of importance, certain types of queries seem to run faster on MYISAM as opposed to INNODB. If you want to retool a bit then you could store this data (or a copy of it) in mysql's HEAP engine, so just in memory. You'd need to be careful to synchronize it with a disktable on writes though if you want to keep altered data for sure. (just in case of a server failure or shutdown)
Alternatively you could cache your data in something like memcache or by using apc_store, which should be very fast since it's in php process memory. The big caveat here is that APC generally has less memory available for storage though.(default being 32MB) Memcache's big adavantage is that while still fast, it's distributed, so if you have multiple servers running they could share this data.
You could try a nosql database, preferably one that's just a key-store, not even a document store, such as redis.
And finally you could hardcode your values in your php script, make sure to still use something like eaccelerator or APC and verify wether you really need to use them 4 times or wether you can't just cache the output of whatever it is you actually create with it.
So I'm sorry I can't give you a ready-made answer but performance questions, when applicable, usually require a multi-pronged approach. :-|

Is there a way of keeping database data in PHP while server is running?

I'm making a website that (essentially) lets the user submit a word, matches it against a MySQL database, and returns the closest match found. My current implementation is that whenever the user submits a word, the PHP script is called, it reads the database information, scans each word one-by-one until a match is found, and returns it.
I feel like this is very inefficient. I'm about to make a program that stores the list of words in a tree structure for much more effective searching. If there are tens of thousands of words in the database, I can see the current implementation slowing down quite a bit.
My question is this: instead of having to write another, separate program, and use PHP to just connect to it with every query, can I instead save an entire data tree in memory with just PHP? That way, any session, any query would just read from memory instead of re-reading the database and rebuilding the tree over and over.

I'd look into running an instance of memcached on your server. http://www.memcached.org.
You should be able to store the compiled tree of data in memory there and retrieve it for use in PHP. You'll have to load it into PHP to perform your search, though, as well as architect a way for the tree in memcached to be updated when the database changes (assuming the word list can be updated, since there's not a good reason to store it in a database otherwise).

Might I suggest looking at the memory table type in mysql: http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
You can then still use mysql's searching features on fast "in memory" data.

PHP really isn't a good language for large memory structures. It's just not very memory efficient and it has a persistence problem, as you are asking about. Typically with PHP, people would store the data in some external persistent data store that is optimized for quick retrieval.
Usually people use a two fold approach:
1) Store data in the database as optimized as possible for standard queries
2) Cache results of expensive queries in memcached
If you are dealing with a lot of data that cannot be indexed easily by relational databases, then you'd probably need to roll your own daemon (e.g., written in C) that kept a persistent copy of the data structure in memory for fast querying capabilities.

What is more expensive for template reading: Database query or File reading?

My question is fairly simple; I need to read out some templates (in PHP) and send them to the client.
For this kind of data, specifically text/html and text/javascript; is it more expensive to read them out a MySQL database or out of files?
Kind regards
Tom
inb4 security; I'm aware.
PS: I read other topics about similar questions but they either had to do with other kind of data, or haven't been answered.

Reading from a database is more expensive, no question.
Where do the flat files live? On the file system. In the best case, they've been recently accessed so the OS has cached the files in memory, and it's just a memory read to get them into your PHP program to send to the client. In the worst case, the OS has to copy the file from disc to memory before your program can use it.
Where does the data in a database live? On the file system. In the best case, they've been recently accessed so MySQL has that table in memory. However, your program can't get at that memory directly, it needs to first establish a connection with the server, send authentication data back and forth, send a query, MySQL has to parse and execute the query, then grab the row from memory and send it to your program. In the worst case, the OS has to copy from the database table's file on disk to memory before MySQL can get the row to send.
As you can see, the scenarios are almost exactly the same, except that using a database involves the additional overhead of connections and queries before getting the data out of memory or off disc.

There are many factors that would affect how expensive both are.
I'll assume that since they are templates, they probably won't be changing often. If so, flat-file may be a better option. Anything write-heavy should be done in a database.
Reading a flat-file should be faster than reading data from the database.
Having them in the database usually makes it easier for multiple people to edit.
You might consider using memcache to store the templates after reading them, since reading from memory is always faster than reading from a db or flat-file.

It really doesnt make enough difference to worry you. What sort of volume are you working with? Will you have over a million page views a day? If not I'd say pick whichever one is easiest for you to code with and maintain and dont worry about the expense of the alternatives until it becomes a problem.
Specifically, if your templates are currently in file form I would leave them there, and if they are currently in DB form I'd leave them there.

reading from MySQL is faster or reading from a file is faster?

HI
I got a doubt I have seen reading mysql data is slower in case of large tables...I have done lots of optimization but cant get through..
what I am thinking is will it give a better speed if I store data in a piece of file??
off course each data will be a separate file. so millions of data = millions of file. I agree it will consume the disk space... but what about reading process?? is it faster??
I am using PHP to read file...

Reading one file = fast.
Reading many / big files = slow.
Reading singular small entries from database = waste of I/O.
Combining many entries within the database = faster than file accesses.

As long as your tables are properly indexed and as long as you are using those indices (that's right), using a relational DB (like mysql) is going to be much faster, more robust, flexible (insert many buzzwords here), etc.
To examine why your queries' performance does not match your expectations, you can use the explain clause with your selects (http://dev.mysql.com/doc/refman/5.1/en/explain.html).

To answer the topic, yes.
By which I mean that there are so many (unmentioned) factors that it's impossible to unequivocally state that one will be faster than the other every time.

It depends on what kind of data you're storing. Structured data is usually much faster and more flexible/powerful to read using SQL, since that's exactly what its made for. If you want to search, filter, sort or group by a certain attribute, the index structures and optimizations of a DBS are appropriate.
However, when using a DB for storing large files (BLOBs), which contain unstructured data in the sense that you are not going to search, filter, sort or group by any part of the files, then these files just blow up the database size and make it slow. There is an interesting study by Microsoft on this topic (just have to find the link yet). This study is the reason why Microsoft introduced the External BLOB storage in their SQLServer, which basically means what you asked: The BLOBs are saved in files outside the database, because they measured that access is much faster that way.
When storing files (e.g., pictures, videos, documents...) you often have some metadata on the file which you want to be able to use with a structured query language like SQL, while the actual files don't necessarily need to be saved in the database.

Reading from a dbms (MySQL is one) is faster in most cases, because they have built in cache that will keep the data in memory, so next time you try to read the same data, you will not have to wait on the incredible slow hard drive.
A dbms is essentially reading from your hard drive + a cache to speed things up (+ some data sorting algorithms). Remember, your database is stored on your hard drive :)

It depends on a lot of factors, not least of which is what kind of file system you're using. MySQL uses files for storage anyway, so read speed isn't the issue -- the biggest factor will be how fast MySQL can find your data, compared to how fast it can be looked up in your filesystem.
Generally, though, MySQL is quite good about finding data quickly -- after all, that's its purpose in life. So unless you have a really good reason why the FS should be much faster, stick with the DB and check your indexes and such.

By choosing a custom file storage system you will lose the benefits of using a relational database. Also your code might not be easy maintainable.
Nonetheless, there are many who believe that relational databases offer too much complexity at the cost of speed. Have a look at the NoSQL entry in wikipedia and read about possible alternatives.

php poor man's cache

I have some small sets of data from the database (mysql) who are seldom updated.
Basically 3 or 4 small bi dimensional arrays (50-200 items).
This is the ideal case for memcached, but I'm on a shared server and can't install anything.
I only have PHP and MySQL.
I'm thinking about storing the arrays on file and regenerate the file via a cron job every 2-3 hours.
Any better idea or suggestion about this approach?
What's the best way to store those arrays?

If you're working with an overworked MySQL server then yes, cache that data into a file. Then you have two ways to update your cache: either via a cron job, unconditionally, every N minutes (I wouldn't update it less frequently than every hour) or everytime the data changes. The best approach depends on your specific situation. In general, the cron job way is the simplest but the on-change way pretty much guarantees that you won't ever use stale data.
As for the storage format, you could just serialize() the array and save the string to a file. With big arrays, unserialize() is faster than a big array(...) declaration.

As said in the comments, it would be better to check whether the root of the problem can't be fixed first. A roundtrip that long sounds like a network configuration problem.
Otherwise, if the DB simply is that slow, nothing speaks against a filesystem based cache. You could turn each query into an md5() hash, and use that as a file name. Serialize() the result set into the file and fetch it from there. Use filemtime() to determine whether the cache file is older than x hours. If it is, regenerate the query - or in fact, to avoid locking problems on the cache files, use a cron job to regenerate it.
Just note that this way, you would be dealing with whole result sets that you have to load into your script's memory all at once. You wouldn't have the advantage of being able to query a result set row by row. This can be done too in a cached way, but it's more complicated.

My english is not good, sorry.
Some times I have read about any alternative to memcache. Is complex, but I think that you can use http://www.php.net/manual/en/ref.sem.php acceding to shared memory.
A simple class example used for storing data is here:
http://apuntesytrucosdeprogramacion.blogspot.com/2007/12/php-variables-en-memoria-compartida.html
Is written in spanish, sorry, but the code is easy to understand (Eliminar=delete)
I never have test this code!! and I don't know if it's viable in a shared server.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.