Caching SQL Lookups & Retrieving Data

Caching SQL Lookups & Retrieving Data - php

I'm trying to setup caching of postcode lookups, which adds the resulting lookup to a text file using the following;
file_put_contents($cache_file, $postcode."\t".$result."\n", FILE_APPEND);
I'd like to be able to check this file before running a query, which i have done using this:
if( strpos(file_get_contents($cache_file),$postcode) !== false) {
// Run function
}
What I'd like to do, is search for the $postcode with in the text file (as above) and return the data one tab over ($result).
Firstly, is this possible?
Secondly, is this is even a good way to cache SQL lookups?

1) yes it's possible - the easiest way would be storing the lookup data in an array and write/read it from a file with serialze / unserialze
$lookup_codes = array(
'10101' => 'data postcode 1 ...',
'10102' => 'data postcode 2 ...',
// ...
);
file_put_contents($cache_file, serialize($lookup_codes));
$lookup_codes = unserialize(file_get_contents($cache_file));
$postcode = '10101';
if(array_key_exists($postcode, $lookup_codes)){
// ... is available
}
2) is the far more interesting question. It really depends on your data, the structure, the amount etc.
In my opinion, caching add more complexity to your application, and so if possible avoid it :-)
You could try to:
Optimizing your SQL query or database structure to speed it up for requesting postcode data.
Normally Databases are quite fast - and therefore made for such use-cases
I'm not sure which db you are running, but for MySQL look into Select Optimization. Or as another keyword you can search for INDEX which boost queries quite heavy
file_get_contents is really fast, but when you are changing the file often maybe look into other ways of caching, like Memcached for storing it In-Memory

Related

php include array vs mysql query: good idea?

I have an 2D array with a few sub-arrays (about 30, and sub-arrays have 10 elements).
I need to get quite frequently basic data from the array , I have a function that return the contents of it (or partial) all around my scripts. The function looks like:
function get_my_data($index = false){
$sub0 = array(
'something' => 'something',
'something else' => 'else',
...
);
$sub1 = array(
'something' => 'something different',
'something else' => 'else different',
...
);
...
$sub30 = array(
'something' => 'something 30 times different',
'something else' => 'else 30 times different',
...
);
$data = array($sub0,$sub1,$sub2,...,$sub30);
if($index !== false)
return $data[$index];
else
return $data;
?>
And then I call to it using include:
<?php
include 'my_data.php';
$id = $_GET['id'];
$mydata = get_my_data($id);
...
?>
I've done this because when I was starting this project, I didn't imagined I would have more that 10 sub-arrays, and I neither that I would need to have it dynamic. In fact, now I have to add a dynamic column (an index to sub-arrays) and it is not a great idea to use array declaration in this case. I immediately thought to use database, transferring data would not difficult, but if I do that, then I need to change my function get_my_data and insert a query in it, so, for it's called many times, I would have a lot of queries, pretty much every script of my website have one of it. I think performance would be worst (cause mysql is already largely used in my website). The dynamic data would change not too frequently (client do that).
The ideas I have to solve this problem are:
save all data in database and get it through mysql queries,
leave on php side and use files to manage dynamic data,
leave the static part on php side, add a logical connector (such 'id' index in sub-arrays) and id column in mysql database, and get the dynamic data on mysql
I don't want to lose much performance, do yo have any advice or suggestions?

Putting data like this in code is the worst possible plan. Not only do you create a whole bunch of junk and then throw out almost all of it, but if any of this changes it's a nightmare to maintain. Editing source code, checking it into version control, and deploying it is a lot of work to make a simple change to some data.
At the very least store this in a data format like JSON, YAML or XML so you can read it in on-demand and change a data-only file as necessary.
Ideally you put this in a database and query against it when necessary. Databases are designed to store, update, and preserve data like this.
You can also store JSON in the database, MySQL 5.7 even has a native column type for it, which makes this sort of thing even easier.

Select all key data for a specific pattern from Redis using PHP

I am using PHPRedis for this.
I need to create script that copies all of the keys with the pattern mobile* from one Redis host1 to host2.
I have got this working by selecting all keys from host1 with the pattern mobile*. Then looping over each of these keys using the get key method to return the data. I then set the key for host2 using the set method:
$auKeys = $redis->keys("mobile*");
foreach ($auKeys as $key) {
$data = $redis->get($key);
$redis2->set($key, $data, 6000);
echo $key;
}
The problem is this takes around 5 minutes - I need to get it down to 2-3 minutes. Is there another way to do this?

The simplest route to SET you can take for a better performance is to PIPE the keys and hit the redis server once to execute all of them instead of a trip/key .
https://github.com/phpredis/phpredis/issues/251
$pipeline = $redis->multi($host, Redis::PIPELINE);
//put result in our shared list
foreach ($items as $item) {
$pipeline->sAdd($key, $item);
}
$ret = $pipeline->exec();
At the same time, there is also libraries out there if you are seeking a different way to trasnlate commands to Redis Protocol .
redis bulk import using --pipe

Typically, it's best to avoid KEYS in production code. It's preferable to modify the application that's writing the keys yo keep a list of keys in use, where possible, or use the newer SCAN operation.
In this case you revealed that KEYS wasn't taking a long time (it will when you have a very large key space, will the number of keys grow with time?), so the slow performance is due to all the network roundtrips. One per GET. Pipelines are indeed a great way of grouping up operations to avoid roundtrips.
In this case I suggest the use of MGET to get all the values in one network op and MSET to update them in one network op.

Memcache alternatives, more control

My new PHP application could be sped up with some caching of MySQL results.
I have limited experience with memcached, but I don't think it can do what I require.
As I am working on a multi-user application I would like to be able to delete several stored values at once without removing everything.
So I might store:
account_1.value_a = foo
account_1.value_b = bar
account_2.value_a = dog
account_2.value_b = cat
Is there a caching system that would allow me to delete based on a wildcard (or similar method) such as "delete account_1.*" leaving me with:
account_1.value_a = <unset>
account_1.value_b = <unset>
account_2.value_a = dog
account_2.value_b = cat
Thanks,
Jim

Not really, but you can fake it by using version numbers in your keys.
For example, if you use keys like this:
{entitykey}.{version}.{fieldname}
So now your account_1 object keys would be:
account_1.1.value_a
account_1.1.value_b
When you want to remove account_1 from the cache, just increment the version number for that object. Now your keys will be:
account_1.2.value_a
account_1.2.value_b
You don't even need to delete the original cached values - they will fall out of the cache automatically since you'll no longer be using them.

This might help: memcache and wildcards

Open source module to get tags for keys in memcache, and other:
http://github.com/jamm/memory/

Scache (http://scache.nanona.fi) has nested keyspaces so you could store data on subkeys and expire parent when needed.

Memcached delete by tag can be done like this;
Searching and deleting 100,000 keys is quite fast, but performance should be monitored in much larger caches.
Before Php 8.0
$tag = "account_1";
$cached_keys = $this->memcached->getAllKeys();
foreach($cached_keys as $key){
if(substr($key, 0, strlen($tag)) === $tag){
$this->memcached->delete($key);
}
}
Php 8.0 >
$tag = "account_1";
$cached_keys = $this->memcached->getAllKeys();
foreach($cached_keys as $key){
if (str_starts_with($key, $tag)) {
$this->memcached->delete($key);
}
}

Handling large datasets with PHP/Drupal

I have a report page that deals with ~700k records from a database table. I can display this on a webpage using paging to break up the results. However, my export to PDF/CSV functions rely on processing the entire data set at once and I'm hitting my 256MB memory limit at around 250k rows.
I don't feel comfortable increasing the memory limit and I haven't got the ability to use MySQL's save into outfile to just serve a pre-generated CSV. However, I can't really see a way of serving up large data sets with Drupal using something like:
$form = array();
$table_headers = array();
$table_rows = array();
$data = db_query("a query to get the whole dataset");
while ($row = db_fetch_object($data)) {
$table_rows[] = $row->some attribute;
}
$form['report'] = array('#value' => theme('table', $table_headers, $table_rows);
return $form;
Is there a way of getting around what is essentially appending to a giant array of arrays? At the moment I don't see how I can offer any meaningful report pages with Drupal due to this.
Thanks

With such a large dataset, I would use Drupal's Batch API which allows for time intensive operations to be broken into batches. It is also better for users because it will give them a progress bar with some indication of how long the operation will take.
Start the batch operation by opening a temporary file, then append new records to it on each new batch until done. The final page can do the final processing to deliver the data as cvs or convert to PDF. You'd probably want to add some cleanup afterwords as well.
http://api.drupal.org/api/group/batch/6

If you are generating PDF or CSV you shouldn't use the Drupal native functions. What about writing to the output file inside your while loop? This way, only one result set is in memory at a given time.

At the moment you store everything in the array $table_rows.
Can't you flush at least parts of the report while you're reading it from the database (e.g. every so and so many lines) in order to free some of the memory? I can't see why it should only be possible to write to a csv at once.

I don't feel comfortable increasing the memory limit
Increasing the memory limit doesn't mean that every php process will use that amount of memory. However you could exec the cli version of php with a custom memory limit - but that's not the right solution either....
and I haven't got the ability to use MySQL's save into outfile to just serve a pre-generated CSV
Then don't save it all in an array - write each line to the output buffer when you fetch it from the database (IIRC the entire result set is buffered outside the limited php memory). Or write it directly to a file then do a redirect when the file is completed and closed.
C.

You should include paging into that with a pager_query, and break results into 50-100 per page. That should help a lot. You say you want to use paging but I don't see it in the code.
Check this out: http://api.drupal.org/api/function/pager_query/6

Another things to keep in mind is that in PHP5 (before 5.3), assigning an array to a new variable or passing it to a function copies the array and does not create a reference. You may be creating many copies of the same data, and if none are unset or go out of scope they cannot be garbage collected to free up memory. Where possible, using references to perform operations on the original array can save memory
function doSomething($arg){
foreach($arg AS $var)
// a new copy is created here internally: 3 copies of data exist
$internal[] = doSomethingToValue($var);
return $internal;
// $arg goes out of scope and can be garbage collected: 2 copies exist
}
$var = array();
// a copy is passed to function: 2 copies of data exist
$var2 = doSomething($var);
// $var2 will be a reference to the same object in memory as $internal,
// so only 2 copies still exist
if the $var is set to the return value of the function, the old value can be garbage collected, but not until after the assignment, so more memory will still be needed for a brief time
function doSomething(&$arg){
foreach($arg AS &$var)
// operations are performed on original array data:
// only two copies of an array element exist at once, not the whole array
$var = doSomethingToValue($var);
unset($var); // not needed here, but good practice in large functions
}
$var = array();
// a reference is passed to function: 1 copy of data exists
doSomething($var);

The way I approach such huge reports is to generate them with the php cli/Java/CPP/C# (i.e. CRONTAB) + use the unbuffered query option mysql has.
Once the file/report creation is done on the disk, you can give a link to it...

Most efficient way to get data from the database to session

What is the quickest way to get a large amount of data (think golf) and the most efficient (think performance) to get a large amount of data from a MySQL database to a session without having to continue doing what I already have:
$sql = "SELECT * FROM users WHERE username='" . mysql_escape_string($_POST['username']) . "' AND password='" . mysql_escape_string(md5($_POST['password'])) . "'";
$result = mysql_query($sql, $link) or die("There was an error while trying to get your information.\n<!--\n" . mysql_error($link) . "\n-->");
if(mysql_num_rows($result) < 1)
{
$_SESSION['username'] = $_POST['username'];
redirect('index.php?p=signup');
}
$_SESSION['id'] = mysql_result($result, '0', 'id');
$_SESSION['fName'] = mysql_result($result, '0', 'fName');
$_SESSION['lName'] = mysql_result($result, '0', 'lName');
...
And before anyone asks yes I do really need to 'SELECT
Edit: Yes, I am sanitizing the data, so that there can be no SQL injection, that is further up in the code.

I came up with this and it appears to work.
while($row = mysql_fetch_assoc($result))
{
$_SESSION = array_merge_recursive($_SESSION, $row);
}

Most efficient:
$get = mysql_query("SELECT * FROM table_name WHERE field_name=$something") or die(mysql_error());
$_SESSION['data'] = mysql_fetch_assoc($get);
Done.
This is now stored in an array. So say a field is username you just do:
echo $_SESSION['data']['username'];
Data is the name of the array - username is the array field.. which holds the value for that field.
EDIT: fixed some syntax mistakes :P but you get the idea.

OK, this doesn't answer your question, but doesn't your current code leave you open to SQL Injection?
I could be wrong, never worked in PHP, just saw the use of strings in the SQL and alarm bells started ringing!
Edit:
I am not trying to tamper with your post, I was correcting a spelling error, please do not roll back.

I am not sure what you mean by "large amounts of data", but it looks to me like you are only initializing data for one user? If so, I don't see any reason to optimize this unless you have hundreds of columns in your database with several megabytes of data in them.
Or, to put it differently, why do you need to optimize this? Are you having performance problems?
What you are doing now is the straight-forward approach, and I can't really see any reason to do it differently unless you have some specific problems with it.
Wrapping the user data in a user object might help some on the program structure though. Validating your input is probably also a good idea.

#Unkwntech
Looks like you are correct, but following a Google, which led here looks like you may want to change to mysql_real_escape_string()
As for the edit, I corrected the spelling of efficient as well as removed the "what is the".. Since that's not really required since the topic says it all.
You can review the edit history (which nicely highlights the actual changes) by clicking the "edited a min ago" text at the bottom of your question.

Try using json for example:
$_SESSION['data'] = json_encode(mysql_fetch_array($result));
Is the implementation of that function faster than what he is already doing?
Does anyone else have trouble entering ] into markdown? I have to paste it in
Yes, it's bugged.

#Anders - there are something like 50-75 columns.
Again, unless this is actually causing performance problems in your application I would not bother with optimizing it. If, however, performance is a problem I would consider only getting some of the data initially and lazy-loading the other columns as they are needed.

If Unkwntech's suggestion does indeed work, I suggest you change your SELECT statement so that it doesn't grab everything, since your password column would be one of those fields.
As for whether or not you need to keep this stuff in the session, I say why not? If you're going to check the DB when the user logs in (I'm assuming this would be done then, no?) anyway, you might as well store fairly non-sensitive information (like name) in the session if you plan on using that information throughout the person's visit.

It's not so much that it causing performance problems but that I would like the code to look a bit cleaner.
Then wrap the user data in a class. Modifyng $_SESSION directly looks somewhat dirty. If you want to keep the data in a dictionary—which you can still do even if you put it in a separate class.
You could also implement a loop that iterates over all columns, gets their names and copy the data to a map with the same key names. That way your internal variables—named by key in the dictionary—and database column names would always be the same. (This also has the downside of changing the variable name when you change the column name in the database, but this is a quite common and well-accepted trade-off.)

Try using json for example:
$_SESSION['data'] = json_encode(mysql_fetch_array($result));
Edit
Later you then json_decode the $_SESSION['data'] variable and you got an array with all the data you need.
Clarification:
You can use json_encode and json_decode if you want to reduce the number of lines of code you write. In the example in the question, a line of code was needed to copy each column in the database to the SESSION array. Instead of doing it with 50-75 lines of code, you could do it with 1 by json_encoding the entire database record into a string. This string can be stored in the SESSION variable. Later, when the user visits another page, the SESSION variable is there with the entire JSON string. If you then want to know the first name, you can use the following code:
$fname = json_decode($_SESSION['data'])['fname'];
This method won't be faster than a line by line copy, but it will save coding and it will be more resistant to changes in your database or code.
BTW
Does anyone else have trouble entering ] into markdown? I have to paste it in.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Caching SQL Lookups & Retrieving Data - php

Related

php include array vs mysql query: good idea?

Select all key data for a specific pattern from Redis using PHP

Memcache alternatives, more control

Handling large datasets with PHP/Drupal

Most efficient way to get data from the database to session

Categories

Resources