array_slice() for large array, good or bad

array_slice() for large array, good or bad - php

In PHP, does array_slice() serve good enough to process large data array that cannot be paginated since its not stored in database but calculated on other db tables.
Anyways, so I have an array of around 50k which might increase later. First time on page load it fetches all 50k records then slices it for ajax based pagination.
Will this cause server load in future since all records are being fetched on page load?

At first its a bad idea to create array containing 50K moreover it can encrease. It may "eat" all your memory in high traffic.
Also where you store sliced parts of array for using on ajax requests?
I think (if you can not set limit in query) you can create additional table in which you can store your data (with cron for example) and show users data from it using limit for pagination, or you can create caching layer (or use existed caching systems: file cache, php memcache, ...), and write some algorithm for updating cache (it depends on your programm logic).

Related

Handling big arrays in PHP

The application i am working on needs to obtain dataset of around 10mb maximum two times a hour. We use that dataset to display paginated results on the site also simple search by one of the object properties should also be possible.
Currently we are thinking about 2 different ways to implement this
1.) Store the json dataset in the database or a file in the file system, read that and loop over to display results whenever we need.
2.) Store the json dataset in relational MySQL table and query the results and loop over whenever we need to display them.
Replacing/Refreshing the results has to be done multiple times per hour as i said.
Both ways have cons. I am trying to choose a good way which is less evil overall. Reading 10 MB in memory is not a lot and on the other hand rewriting a table few times a hour could produce conflicts in my opinion.
My concern regarding 1.) is how safe the app will be if we read 10mb in the memory all the time? What will happen if multiple users do this at some point of time, is this something to worry about or PHP is able to handle this in background?
What do you think it will be best for this use case?
Thanks!

When php runs on a web server (as it usually does) the server starts new php processes on demand when they're needed to handle concurrent requests. A powerful web server may allow fifty or so php processes. If each of them is handling this large data set, you'll need to have enough RAM for fifty copies. And, you'll need to load that data somehow for each new request. Reading 10mb from a file is not an overwhelming burden unless you have some sort of parsing to do. But it is a burden.
As it starts to handle each request, php offers a clean context to the programming environment. php is not good at maintaining in-RAM context from one request to the next. You may be able to figure out how to do it, but it's a dodgy solution. If you're running on a server that's shared with other web applications -- especially applications you don't trust -- you should not attempt to do this; the other applications will have access to your in-RAM data.
You can control the concurrent processes with Apache or nginx configuration settings, and restrict it to five or ten copies of php. But if you have a lot of incoming requests, those requests get serialized and they will slow down.
Will this application need to scale up? Will you eventually need a pool of web servers to handle all your requests? If so, the in-RAM solution looks worse.
Does your json data look like a big array of objects? Do most of the objects in that array have the same elements as each other? If so, that's conformable to a SQL table? You can make a table in which the columns correspond to the elements of your object. Then you can use SQL to avoid touching every row -- every element of each array -- every time you display or update data.
(The same sort of logic applies to Mongo, Redis, and other ways of storing your data.)

PHP: How to cache a big table in redis?

Assume, that I have a big (MySQL-)table (>10k rows) with id -> string. I can put them all in an array and cache this array. But the question ist: How to cache it efficiently?
a) Cache it as one big item. So I will execute
$redis->set("array", $array);
Quite short and easy. But for every entry I need, I have to fetch the whole thing. Absolutely inefficient.
b) Cache every entry itself:
foreach( $array as $id => $str )
$redis->set( "array:$id", $str );
Using this way, I will have >10k entries in Redis. That doesn't feel good. If I have 10 of these tables, i will have 100k entries....
So what's your proposal? How to cache a big array?

Caching the big array only makes helpful if you're planning to retrieve it always as a whole. However cache invalidation will be a very "heavy" operation as anytime when you change something you have to invalidate the whole array and reread it from the DB.
10k in redis is not much at all. You can have millions of entries without problem.
I would go with the b) version. Cache every entry individually. Easier to maintain, simpler application code and smaller memory footprint from application side which gets more and more important when you want to scale your application.

The first question is: why do you need to cache that array.
If you allways need the whole array, then:
$redis->set("array", $array);
If you only need some specific indexes (2nd solution), then why are you trying to cache the whole thing instead of querying the database each time for the id you need.
It is allways more efficient to get only needed data.
Remember that a cache usefullness is estimmated using the ration between reads (items effectively read from the cache) and miss (items read from the datasource then added to the cache).
If you are caching the whole table (10k miss), but querying only few elements by id (2de solution), then your ratio is near zero.
If you need the whole table each time, then cache it using the first solution (1miss) and so your ratio is more likely to be > 1.
Also, remember that redis is a separate server. For each request to redis, a request is made to this server (on localhost or not).
So basicly it's the same rule for redis than for mysql: One big request will perform faster than many little requests.

Store PostgreSQL resultset into sessions?

I'm creating a web app using PHP and PostgreSQL. Database are too big and some kinds of
search takes a lot of time to process.
User can wait into first search, but it sucks when paginates. Can I store the resultset into sessions vars?
It would be great to deal with pagination.

Yes, of course you can, but with a lot of different resultsets, loading the amount of data on a session_start can become cumbersome. Also, the same search could be done by more then one user, resulting in duplicate (time consuming) retrievals / large sessions (every pageload takes longer) / more storage space.
If the data is safe from alteration enough to store in a session, look at caches like APC or Memcached to store the result with a meaningful key. That way, the load is lifted from the database, and searches can be shared amongst users.
If you have a fixed number of items for pagination, you could even consider storing the different 'pages' with a pre- or postfix in the key, so you don't have to select a subset of the result every time (i.e. store with a key like 'search:foo=bar|1' / 'search:foo=bar|2' etc.)

Transfer table to Memcache

I have a large table and I'd like to store the results in Memcache. I have Memcache set up on my server but there doesn't seem to be any useful documentation (that I can find) on how to efficiently transfer large amounts of data. The only way that I currently can think of is to write a mysql query that grabs the key and value of the table and then saves that in Memcache. Its not a particularly scalable solution (especially when my query generates a few hundred thousand rows). Any advice on how to do this?
EDIT: there is some confusion about what I"m attempting to do. Lets say that I have a table with two fields (key and value). I am pulling in information on the fly and have to match it to the key and return the value. I'd like to avoid having to execute ~1000 queries per page load. Memcache seems like a perfect alternative because its set up to use key value. Lets say this table has 100K rows. THe only way that I know to get that data from the db table to memcache is to run a query that loops through every row in the table and creates an individual memcache row.
Questions: Is this a good way to use memcache? If yes, is there a better way to transfer my table?

you can actually pull all the rows in an array and store the array in memcache
memcache_set($memcache_obj, 'var_key', $your_array);
but you have to remember few things
PHP will serialize/unserialize the array from memcache so if you have many rows it might be slower then actually querying the DB
you cannot do any filtering (NO SQL), if you want to filter some items you have to implement this filter yourself and it would probably perform worst then the DB engine.
memcache won't store more then 1 megabyte ...
I don't know what you try to achieve but the general use of memcache is:
store the result of SQL/time consuming processing but the number of resulting row should be small
store some pre created (X)HTML blobs to avoid DB access.
user session storage

Russ,
It sounds almost as if using a MySQL table with the storage engine set to MEMORY might be your way to go.
A RAM based table gives you the flexibility of using SQL, and also prevents disk thrashing due to a large amount of reads/writes (like memcached).
However, a RAM based table is very volatile. If anything is stored in the table and not flushed to a disk based table, and you lose power... well, you just lost your data. That being said, ensure you flush to a real disk-based table every once in a while.
Also, another plus from using memory tables is you can store all the typical MySQL data types, so there is no 1MB size limit.

store in array or use multiple db queries

Within a php/mysql system we have a number of configuration values (approx 200) - these are mostly booleans or ints and store things such as the number of results per page and whether pages are 2 or 3 columns. This is all stored in a single mysql table and we use a single function to return these values as they are requested, on certain pages loads there can probably be up to around 100 requests to this config table. With the number of sites using this system this means potentially thousands of requests each second to retrieve these values. The question is whether this method makes sense or whether it would be preferable to perform a single request per page and store all the configs in an array and retrieve from here each time instead.

Use a cache such as memcache, APC, or any other. Load the settings once, cache it, and share it through your sessions with a singleton object.
Even if the query cache is saved, it's a waste of time and resources to query the database over and over. Rather, on any request that modifies the values, invalidate the cache that is in memory so it is reloaded immediately the next time someone requests a value from it.

If you enable MySQL query cache, the query that selects your values will be cached in memory, and MySQL will give an instant answer from memory unless the query or data in the underlying tables are changed.
This is excellent both for performance and for manageability.
The query results may be reused between the sessions: that means, if you have 1000 sessions, you don't need to keep 1000 copies of your data.

You might want to consider using memcache for this. I think it would be faster than multiple DB queries (even with query caching on), and you won't need a database connection to get them.
You might want to consider just loading them from a flat file into memory, this affords you the opportunity to version control your config values.

I would defintely recommend memcache for this. We have a similar setup and it has noticably brought resource usage down on that server.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.