I was working on a program which needs a cache system.
so the description is I got a mysql db which has 4 columns, 'mac','src','username','main' Which mac,src,username are key/values and foreign key in main table. it will insert to those 3 first and put their ID in main.
The data I got is about 18m for main table, and for those 3 about 2m each.
I dont want to use a select everytime it needs to insert in main, so I used an array to cache them.
$hash= ['mac'=>[],'src'=>[],'username'=>[]];
and store 'n fetch data like this : $hash['mac']['54:52:00:27:e4:91'];
This approach got bad performance when the hash data's goo beyond 500k ;
So is there any better way to do this ?
PS: I got same thing with nodeJS which i used a npm module named hashtable And Performance was about 10k inserts each 4m. I've read about php arrays and found out they are Hashtables , but now it do the same job with far wayslower, for only 1k it takes atleast 5minutes;
Assuming you're on a Linux server. See: Creating a RAM disk. Once you have a RAM disk, cache each ID as a file, using a sha1() hash of the mac address. The RAM disk file is, well, RAM; i.e., a persistent cache in memory.
<?php
$mac = '54:52:00:27:e4:91';
$cache = '/path/to/ramdisk/'.sha1($mac);
if (is_file($cache)) { // Cached already?
$ID = file_get_contents($cache); // From the cache.
} else {
// Run SQL query here and get the $ID.
// Now cache the $ID.
file_put_contents($cache, $ID); // Cache it.
}
// Now do your insert here.
To clarify: A RAM disk allows you to use filesystem wrappers in PHP, such as file_get_contents() and file_put_contents() to read/write to RAM.
Other more robust alternatives to consider:
Redis: https://www.tutorialspoint.com/redis/redis_php.htm
Memcached: http://php.net/manual/en/book.memcached.php
You can use PHP Super Cache which is very simple which is faster than Reddis, Memcache etc
require __DIR__.'/vendor/autoload.php';
use SuperCache\SuperCache as sCache;
//Saving cache value with a key
// sCache::cache('<key>')->set('<value>');
sCache::cache('myKey')->set('Key_value');
//Retrieving cache value with a key
echo sCache::cache('myKey')->get();
https://packagist.org/packages/smart-php/super-cache
Related
I have to extract 800 000 rows (it's dev environment, in real it could be 1, even 1.5 mln rows). The second step is to insert that stuff as serialized data into Redis cache.
My problems are:
Allowed memory size of X bytes is exhausted, but... the key is that I can't change it.
I figure it out that I can fetching data part by part from MySQL DB. But then second step is problematic. So far I used to storing data in Redis as String type (serialized php array of arrays with MySQL data). As yet I don't know how append properly each one part of fetched data. (getting serialized data, unserializing it, then appending new part of data, and serializing it again is... stupid).
Have You any experience with large data operations? I have no idea how to handle it.
Please? :)
Best regards.
In your case I would put the content you generate in file as a redis SET command, then execute the file with redis-cli --pipe, read in the following link how your redis mess insertion data file should look like:
redis mess insertion
<?php
while ($line = mysqli_fetch_assoc($result)) {
$data['data'] = $line['somekey']; // here generate the desired data
$data['key'] = 'thekey'; // make the data key
$redisCommnand = 'SET '.$data['key'].' '.$data['data']; // here you create the redis insert command line someDataKey
file_put_contents('/tmp/data.txt', $redisCommand, FILE_APPEND);// appending the command to redis command file
}
// exec command to run the file
exec('cat data.txt | redis-cli --pipe');
// consider removing the data.txt file
// unlink('/tmp/data.txt');
I've written a small php script to benchmark the performance of our various LAMP servers. I'm doing the test on speed of various factors such as disk I/O, database I/O, etc.
In the script, I'm first creating a random string of 100KB called $payload.
For Disk I/O check, I'm writing $payload to disk for 1000 times using file_put_contents() which completes in a few milliseconds.
Secondly, by using same logic for sqlite check, I'm inserting 1000 records of $payload string in an sqlite table. Shouldn't they take the same amount of time? But this sqlite inserts goes on for everrr. Any idea why?
for($i=0;$i<100000;$i++) //generate a big string
{
$n=rand(0,57)+65;
$payload = $payload.chr($n);
}
//write test:
$start = microtime(true);
if ($type=='disk') // Disk I/O -> This takes only a few msecs.
{
for($i=0;$i<1000;$i++) file_put_contents($fname,$payload);
}
else if ($type=='sqlite') //sqlite test -> This keeps running for everrrrrr.....
{
$db = new SQLite3("benchmark.db");
$db->exec('create table temp(t text)');
for($i=0;$i<1000;$i++) {
$db->exec("insert into temp values('{$payload}')");
};
}
$wtime=round((microtime(true) - $start)*1000);
When you are not using explicit transactions, SQLite will wrap an automatic transaction around every statement.
To guarantee that a transaction is durable, the database has to flush the data to disk at the end of each transaction.
This implies that it waits for the disk write to complete before continuing.
To make the database check similar to the disk check, execute $db->exec("pragma synchronous = off") after creating the DB.
However, you wouldn't want to use this setting in a real database where you'd care about data loss.
Wrap $db->exec("begin") and $db->exec("commit") around the loop to use a single transaction for all writes.
Let's say I cache data in a PHP file in PHP array like this:
/cache.php
<?php return (object) array(
'key' => 'value',
);
And I include the cache file like this:
<?php
$cache = include 'cache.php';
Now, the question is will the cache file be automatically cached by APC in the memory? I mean as a typical opcode cache, as all .php files.
If I store the data differently for example in JSON format (cache.json), the data will not be automatically cached by APC?
Would apc_store be faster/preferable?
Don't mix APC's caching abilities with its ability to optimize intermediate code and cache compiled code. APC provides 2 different things:
It gives a handy method of caching data structures (objects,
arrays etc), so that you can store/get them with apc_store and
apc_fetch
It keeps a compiled version of your scripts so that the
next time they run, they run faster
Let's see an example for (1): Suppose you have a data structure which takes 1 second to calculate:
function calculate_array() {
sleep(1);
return array('foo' => 'bar');
}
$data = calculate_array();
You can store its output so that you don't have to call the slow calculate_array() again:
function calculate_array() {
sleep(1);
return array('foo' => 'bar');
}
if (!apc_exists('key1')) {
$data = calculate_array();
apc_store('key1', $data);
} else {
$data = apc_fetch('key1');
}
which will be considerably faster, much less than the original 1 second.
Now, for (2) above: having APC will not make your program run faster than 1 second, which is the time that calculate_array() needs. However, if your file additionally needed (say) 100 milliseconds to initialize and execute, simply having enabled APC will make it need (approx) 20 millisecond. So you have an 80% increase in initialization/preparation time. This can make quite a difference in production systems, so simply installing APC can have a noticeable positive impact on your script's performance, even if you never explicitly call any of its functions
If you are just storing static data (as in your example), it would be preferable to use apc_store.
The reasoning behind this is not so much whether the opcode cache is faster or slower, but the fact you are using include to fetch static data into scope.
Even with an opcode cache, the file will still be checked for consistency on each execution. PHP will not have to parse the contents, but it will have to check whether the file exists, and that it hasn't changed since the opcode cache was created. Filesystem checks are resource expensive, even if it is only to stat a file.
Therefore, of the two approaches I would use apc_store to remove the filesystem checks completely.
Unlike the other answer I would use the array-file-solution (the first one)
<?php return (object) array(
'key' => 'value',
);
The reason is, that with both solutions you are on the right side, but when you let the caching up to APC itself you don't have to juggle around with the apc_*()-functions. You simply include and use it. When you set
apc.stat = 0
you avoid the stat-calls on every include too. This is useful for production, but remember to clear the system-cache on every deployment.
http://php.net/apc.configuration.php#ini.apc.stat
Oh, not to forget: With the file-approach it works even without APC. Useful for the development setup, where you usually shouldn't use any caching.
For awhile now, I've been storing serialized objects from forked processes in /tmp with file_put_contents.
Once all child processes wrap up, I'm simply using file_get_contents and unserializing the data to rebuild my object for processing.
so my question is, is there a better way of storing my data without writing to /tmp?
Outside of storing the data in a file, the only other native solutions that come to mind is shm http://www.php.net/manual/en/function.shm-attach.php or socket stream pairs http://www.php.net/manual/en/function.stream-socket-pair.php
Either of these should be doable if the data collected is unimportant after the script is run. The idea behind both of them is to just open a communication channel between your parent and child processes. I will say that my personal opinion is that unless there is some sort of issue using the file system is causing that it is by far the least complicated solution.
SHM
The idea with shm is that instead of storing the serialized objects in a file, you would store them in an shm segment protected for concurrency by a semaphore. Forgive the code, it is rough but should be enough to give you the general idea.
/*** Configurations ***/
$blockSize = 1024; // Size of block in bytes
$shmVarKey = 1; //An integer specifying the var key in the shm segment
/*** In the children processes ***/
//First you need to get a semaphore, this is important to help make sure you don't
//have multiple child processes accessing the shm segment at the same time.
$sem = sem_get(ftok(tempnam('/tmp', 'SEM'), 'a'));
//Then you need your shm segment
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
if (!$sem || !$shm) {
//error handling goes here
}
//if multiple forks hit this line at roughly the first time, the first one gets the lock
//everyone else waits until the lock is released before trying again.
sem_acquire($sem);
$data = shm_has_var($shm, $shmVarKey) ? shm_get_var($shm, $shmVarKey) : shm_get_var($shm, $shmVarKey);
//Here you could key the data array by probably whatever you are currently using to determine file names.
$data['child specific id'] = 'my data'; // can be an object, array, anything that is php serializable, though resources are wonky
shm_put_var($shm, $shmVarKey, $data); // important to note that php handles the serialization for you
sem_release($sem);
/*** In the parent process ***/
$shm = shm_attach(ftok(tempnam('/tmp', 'SHM'), 'a'), $blockSize);
$data = shm_get_var($shm, $shmVarKey);
foreach ($data as $key => $value)
{
//process your data
}
Stream Socket Pair
I personally love using these for inter process communication. The idea is that prior to forking, you create a stream socket pair. This results in two read write sockets being created that are connected to each other. One of them should be used by the parent, one of them should be used by the child. You would have to create a separate pair for each child and it will change your parent's model a little bit in that it will need to manage the communication a bit more real time.
Fortunately the PHP docs for this function has a great example: http://us2.php.net/manual/en/function.stream-socket-pair.php
You could use a shared memory cache such as memcached which would be faster, but depending on what you're doing and how sensitive/important the data is, a file-based solution may be your best option.
On my website there is a php function func1(), which gets some info from other resources. It is very costly to run this function.
I want that when Visitor1 comes to my website then this func1() is executed and the value is stored in $variable1=func1(); in a text file (or something, but not a database).
Then a time interval of 5 min starts and when during this interval Visitor2 visits my website then he gets the value from the text file without calling the function func1().
When Visitor3 comes in 20 min, the function should be used again and store the new value for 5 minutes.
How to make it? A small working example would be nice.
Store it in a file, and check the file's timestamp with filemtime(). If it's too old, refresh it.
$maxage = 1200; // 20 minutes...
// If the file already exists and is older than the max age
// or doesn't exist yet...
if (!file_exists("file.txt") || (file_exists("file.txt") && filemtime("file.txt") < (time() - $maxage))) {
// Write a new value with file_put_contents()
$value = func1();
file_put_contents("file.txt", $value);
}
else {
// Otherwise read the value from the file...
$value = file_get_contents("file.txt");
}
Note: There are dedicated caching systems out there already, but if you only have this one value to worry about, this is a simple caching method.
What you are trying to accomplish is called caching. Some of the other answers you see here describe caching at it's simplest: to a file. There are many other options for caching depending on the size of the data, needs of the application, etc.
Here are some caching storage options:
File
Database/SQLite (yes, you can cache to a database)
MemCached
APC
XCache
There are also many things you can cache. Here are a few:
Plain Text/HTML
Serialized data such as PHP objects
Function Call output
Complete Pages
For a simple, yet very configurable way to cache, you can use the Zend_Cache component from the Zend Framework. This can be used on it's own without using the whole framework as described in this tutorial.
I saw somebody say use Sessions. This is not what you want as sessions are only available to the current user.
Here is an example using Zend_Cache:
include ‘library/Zend/Cache.php’;
// Unique cache tag
$cache_tag = "myFunction_Output";
// Lifetime set to 300 seconds = 5 minutes
$frontendOptions = array(
‘lifetime’ => 300,
‘automatic_serialization’ => true
);
$backendOptions = array(
‘cache_dir’ => ‘tmp/’
);
// Create cache object
$cache = Zend_Cache::factory(‘Core’, ‘File’, $frontendOptions, $backendOptions);
// Try to get data from cache
if(!($data = $cache->load($cache_tag)))
{
// Not found in cache, call function and save it
$data = myExpensiveFunction();
$cache->save($data, $cache_tag);
}
else
{
// Found data in cache, check it out
var_dump($data);
}
In a text file. Oldest way of saving stuff (almost). Or do a cronjob to run the script with the function each 5 minutes independently on the visits.
Use caching, such as APC!
If the resource is really big, this may not be the best option and a file may then indeed be better.
Look at:
apc_store
apc_fetch
Good luck!