Speed/Memory Issue with PHP [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Least memory intensive way to read a file in PHP
I have a problem with speed vs. memory usage.
I have a script which needs to be able to run very quickly. All it does is load multiple files from 1-100MB, consisting of a list of values and checks how many of these exist against another list.
My preferred way of doing this is to load the values from the file into an array (explode), and then loop through this array and check whether the value exists or not using isset.
The problem I have is that there are too many values, it uses up >10GB of memory (I don't know why it uses so much). So I have resorted to loading the values from the file into memory a few at a time, instead of just exploding the whole file. This cuts the memory usage right down, but is VERY slow.
Is there a better method?
Code Example:
$check=array('lots','of','values','here');
$check=array_flip($check);
$values=explode('|',file_get_contents('bigfile.txt'));
$matches=0;
foreach($values as $key)
if (isset($check[$key])) $matches++;

Maybe you could code your own C extension of PHP (see e.g. this question), or code a small utility program in C and have PHP run it (perhaps using popen)?

These seems like a classic solution for some form of Key/Value orientated NoSQL datastore (mongodb, couchdb, Riak) (or maybe even just a large memcache instance).
Assuming you can load the large data files into the datastore ahead of when you need to do the searching and that you'll be using the data from the loaded files more than once, you should see some impressive gains (as long your queries, mapreduce, etc aren't awful), judging by the size of your data you may want to look at a data store which doesn't need to hold everything in memory to be quick.
There are plenty of PHP drivers (and tutorials) for each of the datastores I mentioned above.

Open the files and read through them line wise. Maybe use MySQL, for import (LOAD DATA INFILE), for resulting data or both.

It seems you need some improved search engine.
Sphinx search server can be used for searching your values really fast.

Related

Detecting if something exists and then automating process

I was wondering if it would be possible to write a script in PHP which would proceed through an extremely large data set (100 million+) to try locate specific strings within the data set?
If it is feasibly possible would it be an efficient form of identifying a keyword within the dataset?
If there is a better way of processing through such a large dataset to try an detect a string I am all ears
Well like Jari said everything is possible in programming.
I deal with large data via Hadoop, MapReduce etc.

mysql_data_seek versus storing data in array

I have searched for a few hours already but have found nothing on the subject.
I am developing a website that depends on a query to define the elements that must be loaded on the page. But to organize the data, I must repass the result of this query 4 times.
At first try, I started using mysql_data_seek so I could repass the query, but I started losing performance. Due to this, I tried exchanging the mysql_data_seek for putting the data in an array and running a foreach loop.
The performance didn't improve in any way I could measure, so I started wondering which is, in fact, the best option. Building a rather big data array ou executing multiple times the mysql_fetch_array.
My application is currently running with PHP 5.2.17, MySQL, and everything is in a localhost. Unfortunatly, I have a busy database, but never have had any problems with the number of connections to it.
Is there some preferable way to execute this task? Is there any other option besides mysql_data_seek or the big array data? Has anyone some information regarding benchmarking testes of these options?
Thank you very much for your time.
The answer to your problem may lie in indexing appropriate fields in your database, most databases also cache frequently served queries but they do tend to discard them once the table they go over is altered. (which makes sense)
So you could trust in your database to do what it does well: query for and retrieve data and help it by making sure there's little contention on the table and/or placing appropriate indexes. This in turn can however alter the performance of writes which may not be unimportant in your case, only you really can judge that. (indexes have to be calculated and kept).
The PHP extension you use will play a part as well, if speed is of the essence: 'upgrade' to mysqli or pdo and do a ->fetch_all(), since it will cut down on communication between php process and the database server. The only reason against this would be if the amount of data you query is so enormous that it halts or bogs down your php/webserver processes or even your whole server by forcing it into swap.
The table type you use can be of importance, certain types of queries seem to run faster on MYISAM as opposed to INNODB. If you want to retool a bit then you could store this data (or a copy of it) in mysql's HEAP engine, so just in memory. You'd need to be careful to synchronize it with a disktable on writes though if you want to keep altered data for sure. (just in case of a server failure or shutdown)
Alternatively you could cache your data in something like memcache or by using apc_store, which should be very fast since it's in php process memory. The big caveat here is that APC generally has less memory available for storage though.(default being 32MB) Memcache's big adavantage is that while still fast, it's distributed, so if you have multiple servers running they could share this data.
You could try a nosql database, preferably one that's just a key-store, not even a document store, such as redis.
And finally you could hardcode your values in your php script, make sure to still use something like eaccelerator or APC and verify wether you really need to use them 4 times or wether you can't just cache the output of whatever it is you actually create with it.
So I'm sorry I can't give you a ready-made answer but performance questions, when applicable, usually require a multi-pronged approach. :-|

Is there a way of keeping database data in PHP while server is running?

I'm making a website that (essentially) lets the user submit a word, matches it against a MySQL database, and returns the closest match found. My current implementation is that whenever the user submits a word, the PHP script is called, it reads the database information, scans each word one-by-one until a match is found, and returns it.
I feel like this is very inefficient. I'm about to make a program that stores the list of words in a tree structure for much more effective searching. If there are tens of thousands of words in the database, I can see the current implementation slowing down quite a bit.
My question is this: instead of having to write another, separate program, and use PHP to just connect to it with every query, can I instead save an entire data tree in memory with just PHP? That way, any session, any query would just read from memory instead of re-reading the database and rebuilding the tree over and over.
I'd look into running an instance of memcached on your server. http://www.memcached.org.
You should be able to store the compiled tree of data in memory there and retrieve it for use in PHP. You'll have to load it into PHP to perform your search, though, as well as architect a way for the tree in memcached to be updated when the database changes (assuming the word list can be updated, since there's not a good reason to store it in a database otherwise).
Might I suggest looking at the memory table type in mysql: http://dev.mysql.com/doc/refman/5.0/en/memory-storage-engine.html
You can then still use mysql's searching features on fast "in memory" data.
PHP really isn't a good language for large memory structures. It's just not very memory efficient and it has a persistence problem, as you are asking about. Typically with PHP, people would store the data in some external persistent data store that is optimized for quick retrieval.
Usually people use a two fold approach:
1) Store data in the database as optimized as possible for standard queries
2) Cache results of expensive queries in memcached
If you are dealing with a lot of data that cannot be indexed easily by relational databases, then you'd probably need to roll your own daemon (e.g., written in C) that kept a persistent copy of the data structure in memory for fast querying capabilities.

php persistent service? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Cache Object in PHP without using serialize
So I have built a rather large data structure which cannot easily be turned into a relational database format. Using this data structure the requests I make are very fast but it takes about 4-5 seconds to load into memory. What I want is to load it into memory once, then have it sit there and be able to quickly answer individual requests which of course is not the normal flow with the php scripts I have written normally. Is there any good way to do this in php? (again no using a database, it has to use this specialized precomputed structure which takes a long time to load into memory.)
EDIT: This tutorial kind of gives what I want but it is pretty complicated and I was hoping someone would have a more elegant solution. As he says in the tutorial the whole problem is that naturally php is stateless.
You absolutely must do something like what your linked tutorial proposes.
No PHP state persist between requests. This is by design.
Thus you will need some kind of separate long-running process and thus some kind of IPC method, or else you need a better data structure you can load piecemeal.
If you really can't put this into a relational database (such as sqlite--it doesn't have to be a process database), explore using some other kind of database, such as a file-based key-value store.
Note that it is extremely unlikely that any long-running process you write, in any language, will be faster, easier, or better than getting this data structure of yours into a real database, relational or otherwise! Get your data structure into a database! It's the easiest among your possible paths.
Another thing you can do is just make loading your data structure as quick as possible. You can serialize it to a file and then deserialize the file; if that is not fast enough you can try igbinary, which is a much-faster-than-standard-php serializer.

Quick way to do data lookup in PHP

I have a data table with 600,000 records that is around 25 megabytes large. It is indexed by a 4 byte key.
Is there a way to find a row in such dataset quickly with PHP without resorting to MySQL?
The website in question is mostly static with minor PHP code and no database dependencies and therefore fast. I would like to add this data without having to use MySQL if possible.
In C++ I would memory map the file and do a binary search in it. Is there a way to do something similar in PHP?
PHP (at least 5.3) should already be optimized to use mmap if it's available and it is likely advantageous. Therefore, you can use the same strategy you say you would use with C++:
Open a stream with fopen
Move around for your binary search with fseek and fread
EDIT: actually, it seems to use mmap only in some other circumstances like file_get_contents. It shouldn't matter, but you can also try file_get_contents.
I would suggest memcachedb or something similar. If you are going to handle this entirely in PHP the script will have to read the entire file/datastruct for each request. It's not possible to do this in reasonable time dynamically.
In C++, would you stop and start the application each time a user wanted to view the file in a different way, therefore loading and unloading the file? Probably not, but that is how php is different than an application, and application programming languages.
PHP has tools to help you deal with the environment teardown/buildup. These tools are the database and/or keyed caching utilities like memcache. Use the right tool for the right job.

Categories