I have a data table with 600,000 records that is around 25 megabytes large. It is indexed by a 4 byte key.
Is there a way to find a row in such dataset quickly with PHP without resorting to MySQL?
The website in question is mostly static with minor PHP code and no database dependencies and therefore fast. I would like to add this data without having to use MySQL if possible.
In C++ I would memory map the file and do a binary search in it. Is there a way to do something similar in PHP?
PHP (at least 5.3) should already be optimized to use mmap if it's available and it is likely advantageous. Therefore, you can use the same strategy you say you would use with C++:
Open a stream with fopen
Move around for your binary search with fseek and fread
EDIT: actually, it seems to use mmap only in some other circumstances like file_get_contents. It shouldn't matter, but you can also try file_get_contents.
I would suggest memcachedb or something similar. If you are going to handle this entirely in PHP the script will have to read the entire file/datastruct for each request. It's not possible to do this in reasonable time dynamically.
In C++, would you stop and start the application each time a user wanted to view the file in a different way, therefore loading and unloading the file? Probably not, but that is how php is different than an application, and application programming languages.
PHP has tools to help you deal with the environment teardown/buildup. These tools are the database and/or keyed caching utilities like memcache. Use the right tool for the right job.
Related
I am creating a small plugin that process some images. Now to report the progress back to the user I have a small ajax script that will long poll the results back.
Now intern I need an object that keeps track of what is processed. Now the options I am aware of are the following.
Using the PHP session object. I cannot use this in this specific case, because the initial process is also done by ajax. So the main process is a ajax call, and the long poll ajax is another ajax call. They have 2 different session id's, so they do not communicate well.
Second option is to use the database as the storage. I do not know if this is so good because there will be around 40 read / writes on an average job. I know that this is no problem, but it seems a little much for something so simple.
What I actual look for is a sort of memory object if that is possible. Create a small object in the memory that is rapidly updated with the progress and deleted when we are done.
I do not know if that is possible, exists such a thing in PHP, and can I make use of this. Note that this will be a public plugin, so I need to work with methods that are available on all kind of systems, nothing special.
I think database is not the worse solution. If you think write in disk, maybe can be worse.
Memcache is good, but you need a "external plugins free" small plugin who runs easily on win, linux, mac, and so on... is not a good option.
If you use Mysql, you can use Memory engine tables, witch is fast, and truncate it or clean it periodically, with a simple garbage collector algoritm. And if memory table is not a option, innodb is good enough.
You can use memcache for this.
http://php.net/manual/en/book.memcache.php
As key you can use a md5 hash of the image file plus the users ip.
Have a look at Redis, "an open source, advanced key-value store", I think you will like it.
You need to run a Redis server, and access it with different clients. The client of choice for PHP is Predis. The usage is very simple:
$client = new Predis\Client($single_server);
$client->set('library', 'predis');
$retval = $client->get('library');
Database its not a bad idea when you are useing heap table. Sometimes you simply dont have memcache at server.
Check Memory Tables at MySQL documentation
You want easy, highly portable, shared memory between php processes regardless of how php is installed? use a mysql memory table. php without mysql installed is pretty rare.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Least memory intensive way to read a file in PHP
I have a problem with speed vs. memory usage.
I have a script which needs to be able to run very quickly. All it does is load multiple files from 1-100MB, consisting of a list of values and checks how many of these exist against another list.
My preferred way of doing this is to load the values from the file into an array (explode), and then loop through this array and check whether the value exists or not using isset.
The problem I have is that there are too many values, it uses up >10GB of memory (I don't know why it uses so much). So I have resorted to loading the values from the file into memory a few at a time, instead of just exploding the whole file. This cuts the memory usage right down, but is VERY slow.
Is there a better method?
Code Example:
$check=array('lots','of','values','here');
$check=array_flip($check);
$values=explode('|',file_get_contents('bigfile.txt'));
$matches=0;
foreach($values as $key)
if (isset($check[$key])) $matches++;
Maybe you could code your own C extension of PHP (see e.g. this question), or code a small utility program in C and have PHP run it (perhaps using popen)?
These seems like a classic solution for some form of Key/Value orientated NoSQL datastore (mongodb, couchdb, Riak) (or maybe even just a large memcache instance).
Assuming you can load the large data files into the datastore ahead of when you need to do the searching and that you'll be using the data from the loaded files more than once, you should see some impressive gains (as long your queries, mapreduce, etc aren't awful), judging by the size of your data you may want to look at a data store which doesn't need to hold everything in memory to be quick.
There are plenty of PHP drivers (and tutorials) for each of the datastores I mentioned above.
Open the files and read through them line wise. Maybe use MySQL, for import (LOAD DATA INFILE), for resulting data or both.
It seems you need some improved search engine.
Sphinx search server can be used for searching your values really fast.
My question is fairly simple; I need to read out some templates (in PHP) and send them to the client.
For this kind of data, specifically text/html and text/javascript; is it more expensive to read them out a MySQL database or out of files?
Kind regards
Tom
inb4 security; I'm aware.
PS: I read other topics about similar questions but they either had to do with other kind of data, or haven't been answered.
Reading from a database is more expensive, no question.
Where do the flat files live? On the file system. In the best case, they've been recently accessed so the OS has cached the files in memory, and it's just a memory read to get them into your PHP program to send to the client. In the worst case, the OS has to copy the file from disc to memory before your program can use it.
Where does the data in a database live? On the file system. In the best case, they've been recently accessed so MySQL has that table in memory. However, your program can't get at that memory directly, it needs to first establish a connection with the server, send authentication data back and forth, send a query, MySQL has to parse and execute the query, then grab the row from memory and send it to your program. In the worst case, the OS has to copy from the database table's file on disk to memory before MySQL can get the row to send.
As you can see, the scenarios are almost exactly the same, except that using a database involves the additional overhead of connections and queries before getting the data out of memory or off disc.
There are many factors that would affect how expensive both are.
I'll assume that since they are templates, they probably won't be changing often. If so, flat-file may be a better option. Anything write-heavy should be done in a database.
Reading a flat-file should be faster than reading data from the database.
Having them in the database usually makes it easier for multiple people to edit.
You might consider using memcache to store the templates after reading them, since reading from memory is always faster than reading from a db or flat-file.
It really doesnt make enough difference to worry you. What sort of volume are you working with? Will you have over a million page views a day? If not I'd say pick whichever one is easiest for you to code with and maintain and dont worry about the expense of the alternatives until it becomes a problem.
Specifically, if your templates are currently in file form I would leave them there, and if they are currently in DB form I'd leave them there.
I've a web app built in PHP with Zend on a LAMP stack. I have a list of 4000 words I need to load into memory. The words have categories and other attributes, and I need to load the whole collection every time. Think of a dictionary object.
What's the best way to store it for quick recall? A flat file with something like XML, JSON or a serialized object? A database record with a big chunk of XML, JSON or a serialized object? Or 4000 records in a database table?
I realize different server configs will make a difference, but assume an out-of-the-box shared hosting plan, or WAMP locally or some other simple setup.
If you're using APC (or similar), your fastest result is probably going to be coding the word list directly into a PHP source file and then just require_once()'ing it.
In an ideal system I would say memory (memcached), disk and database. But depending on setup the database could be on several occasions faster than disk because the result could stick in the query cache.
It all depends on the environment; and if it's that critical, you should measure it. Otherwise place it where you think it is more accessible.
I'd place it in a file that can be cached, saving you a lot of unnecessary database calls on a (or maybe even every?) page load. How you store it doesn't really matter, whatever works best for you. Speed-wise, 4000 words shouldn't be a problem at all.
For translations in projects I work on I always use language files containing serialized php-data which is simply easy to retrieve:
$text = unserialize(file_get_contents('/language/en.phpdata'));
Format the list as a PHP source and include it.
Failing that, ask yourself if it really matters how fast this will load. 4000 words isn't all that many.
If you need all 4000 in memory all the time, that defeats the purpose of querying a database, although I could be wrong. Serialized object sounds simple enough and I would think it would perform alright on that number of words.
if you are able to use memcached, creating the array once using any of the methods above, sending it to memcached, and then reuse it from there is probably fastest. Check the answer of Can you store a PHP Array in Memcache for an example. Basically it would look like this:
$cache = new Memcache;
$cache->connect('localhost', 11211) or die ("Could not connect");
$cache->set('words', $myarray);
and to get it:
$myarray = $cache->get('words');
If you're going to serialise the list of words as XML/JSON anyway, then just use a file. I think a more natural approach is to include the list in the PHP source though.
If that list is going to change, you will have more flexibility with a database.
If you just need to know which one is faster, I'm going with the DB. In addition to the speed, using the DB is safer and easier to use. But, be careful to use a proper data type, like ntext (MS-SQL server) or BLOB (oracle).
I had similar problem and run some test for it.
Here are the timings for 25 000 loops:
Read one long text from DB: 9.03s
Read one file: 6.26s
Include php file where is variable containing the text: 12.08s
Maybe the fastest way would be to read this data (once, after restart of server) with any of this options and create database stored in memory (storage engine: memory), but it can be little to complicated, so I would prefer "read from file" option.
I have a list of 9 million IPs and, with a set of hash tables, I can make a constant-time function that returns if a particular IP is in that list. Can I do it in PHP? If so, how?
This to me sounds like an ideal application for a Bloom Filter. Have a look at the links provided which might help you get it done ASAP.
http://github.com/mj/php-bloomfilter
http://code.google.com/p/php-bloom-filter/
The interesting thing about this question is the number of directions you can go.
I'm not sure if caching is your best option simply because of the large set of data and the relatively low number of queries on it. Here are a few ideas.
1) Build a ram disk. Link your mysql database table to use the ramdisk partition. I've never tried this, but it would be fun to try.
2) Linux generally has a very fast file system. Build a structured file system that breaks up the records into files, and just call file_get_contents() or file_exists(). Of course this solution would require you to build and maintain the file system, which would also be fun. rsync might be helpful to keep your live filesystem up to date.
Example:
/002/209/001/299.txt
<?
$file = $this->build_file_from_ip($_GET['ip']);
if(file_exists($file)) {
// Execute your code.
}
?>
I think throwing it in memcache would probably be your best/fastest method.
If reading the file into sqlite would be an option you could benefit from indexes thus speeding up lookups?
Otherwise memcached is an option but i don't know how checking for existence would go if you do it with pure php lookups (rather slow my guess)
Have you tried a NoSql solution like Redis? The entire data set is managed in memory.
Here are some benchmarks.