I am trying to implement a hashmap (associative array in PHP) in PHP which is available application wide i.e store it in application context, it should not be lost when the program ends. How I can I achieve this in PHP?
Thanks,
If you are using Zend's version of php, it's easy.
You do not need to serialize your data.
Only contents can be cached. Resources such as filehandles can not.
To store true/false, use 1,0 so you can differentiate a cache failure from a result with ===.
Store:
zend_shm_cache_store('cache_namespace::this_cache_name',$any_variable,$expire_in_seconds);
Retrieve:
$any_variable = zend_shm_cache_fetch('cache_namespace::this_cache_name');
if ( $any_variable === false ) {
# cache was expired or did not exist.
}
For long lived data you can use:
zend_disk_cache_store();zend_disk_cache_fetch();
For those without zend, the corresponding APC versions of the above:
Store:
apc_store('cache_name',$any_variable,$expire_in_seconds);
Retrieve:
$any_variable = apc_fetch('cache_name');
if ( $any_variable === false ) {
# cache was expired or did not exist.
}
Never used any of the other methods mentioned.
If you don't have shared memory available to you, you could serialize/unserialize the data to disk. Of course shared memory is much faster and the nice thing about zend is it handles concurrency issues for you and allows namespaces:
Store:
file_put_contents('/tmp/some_filename',serialize($any_variable));
Retrieve:
$any_variable = unserialize(file_get_contents('/tmp/some_filename') );
Edit: To handle concurrency issues yourself, I think the easiest way would be to use locking. I can still see the possiblity of a race condition in this psuedo code between lock exists and get lock, but you get the point.
Psuedo code:
while ( lock exists ) {
microsleep;
}
get lock.
check we got lock.
write value.
release lock.
You can use APC or similars for this, the data you put there will be available in shared memory.
Bare in mind that this will not persist between server restarts of course.
Related
I have an API written in Laravel. There is the following code in it:
public function getData($cacheKey)
{
if(Cache::has($cacheKey)) {
return Cache::get($cacheKey);
}
// if cache is empty for the key, get data from external service
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
Cache::put($cacheKey, $dataMapped);
return $dataMapped;
}
In getData() if cache contains necessary key, data returned from cache.
If cache does not have necessary key, data is fetched from external API, processed and placed to cache and after that returned.
The problem is: when there are many concurrent requests to the method, data is corrupted. I guess, data is written to cache incorrectly because of race conditions.
You seem to be experiencing some sort of critical section problem. But here's the thing. Redis operations are atomic however Laravel does its own checks before calling Redis.
The major issue here is that all concurrent requests will all cause a request to be made and then all of them will write the results to the cache (which is definitely not good). I would suggest implementing a simple mutual exclusion lock on your code.
Replace your current method body with the following:
public function getData($cacheKey)
{
$mutexKey = "getDataMutex";
if (!Redis::setnx($mutexKey,true)) {
//Already running, you can either do a busy wait until the cache key is ready or fail this request and assume that another one will succeed
//Definately don't trust what the cache says at this point
}
$value = Cache::rememberForever($cacheKey, function () { //This part is just the convinience method, it doesn't change anything
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
return $dataMapped;
});
Redis::del($mutexKey);
return $value;
}
setnx is a native redis command that sets a value if it doesn't exist already. This is done atomically so it can be used to implement a simple locking mechanism, but (as mentioned in the manual) will not work if you're using a redis cluster. In that case the redis manual describes a method to implement distributed locks
In the end I came to the following solution: I use retry() function from Laravel 5.5 helpers to get cache value until it is written there normally with interval of 1 second.
In my app I'm using server-sent events and have the following situation (pseudo code):
$response = new StreamedResponse();
$response->setCallback(function () {
while(true) {
// 1. $data = fetchData();
// 2. echo "data: $data";
// 3. sleep(x);
}
});
$response->send();
My SSE Response class accepts a callback to gather the data (step 1), which actually performs a database query. Now to my problem: As I am trying to avoid polling the database each X seconds, I want to make use of Doctrine's onFlush event to set a flag that the corresponding entity has been actually changed, which would then be checked within fetchData callback. Normally, I would do this by setting a flag on current user session, but as the streaming loop constantly writes data, the session cannot be accessed within the callback. So has anybody an idea how to resolve this problem?
BTW: I'm using Symfony 3.3 and Doctrine 2.5 - thanks for any help!
I know that this question is from a long time ago, but here's a suggestion:
Use shared memory (the php shm_*() functions). That way your flag isn't tied to a specific session.
Be sure to lock and unlock around access to the shared memory (I usually use a semaphore).
I have this function that tries to read some values from cache. But if value does not exists it should call alternative source API and save new value into the cache. However, server is very overloaded and almost each time when value does not exists more then one requests are created (a lot of API calls) and each of them will store new vale into cache. However, what I want is to be able to call API many times, but only one process/request to be able to store it in cache:
function fetch_cache($key, $alternativeSource) {
$redis = new Redis();
$redis->pconnect(ENV_REDIS_HOST);
$value = $redis->get($key);
if( $value === NULL ) {
$value = file_get_contents($alternativeSource);
// here goes part that I need help with
$semaphore = sem_get(6000, 1); // does this need to be called each time this function is called?
if( $semaphore === FALSE ) {
// This means I have failed to create semaphore?
}
if( sem_aquire($semaphore, true) ) {
// we have aquired semaphore so here
$redis->set($key, $value);
sem_release($semaphore); // releasing lock
}
// This must be call because I have called sem_get()?
sem_remove($semaphore);
}
return $value;
}
Is this proper use of semaphore in PHP5?
Short answer
You don't need to create and remove semaphores within the fetch_cache function. Put sem_get() into an initialization method (such as __construct).
You should remove semaphores with sem_remove(), but in a cleanup method (such as __destruct). Or, you might want to keep them even longer - depends on the logic of your application.
Use sem_acquire() to acquire locks, and sem_release() to release them.
Description
sem_get()
Creates a set of three semaphores.
The underlying C function semget is not atomic. There is a possibility of race condition when two processes trying to call semget. Therefore, semget should be called in some initialization process. The PHP extension overcomes this issue by means of three semaphores:
Semaphore 0 a.k.a. SYSVSEM_SEM
Is initialized to sem_get's $max_acquire and decremented as processes acquires it.
The first process that called sem_get fetches the value of SYSVSEM_USAGEsemaphore (see below). For the first process, it equals to 1, because the extension sets it to 1 with atomic semop function right after semget. And if this is really the first process, the extension assigns SYSVSEM_SEM semaphore value to $max_acquire.
Semaphore 1 a.k.a. SYSVSEM_USAGE
The number of processes using the semaphore.
Semaphore 2 a.k.a. SYSVSEM_SETVAL
Plays a role of a lock for internal SETVAL and GETVAL operations (see man 2 semctl). For example, it is set to 1 while the extension sets SYSVSEM_SEM to $max_acquire, then is reset back to zero.
Finally, sem_get wraps a structure (containing the semaphore set ID, key and other information) into a PHP resource and returns it.
So you should call it in some initialization process, when you're only preparing to work with semaphores.
sem_acquire()
This is where the $max_acquire goes into play.
SYSVSEM_SEM's value (let's call it semval) is initially equal to $max_acquire. semop() blocks until semval becomes greater than or equal to 1. Then 1 is substracted from semval.
If $max_acquire = 1, then semval becomes zero after the first call, and the next calls to sem_acquire() will block until semval is restored by sem_release() call.
Call it when you need to acquire the next "lock" from the available set ($max_acquire).
sem_release()
Does pretty much the same as sem_acquire(), except it increments SYSVSEM_SEM's value.
Call it when you need to no longer need the "lock" acquired previously with sem_acquire().
sem_remove()
Immediately removes the semaphore set, awakening allprocesses blocked in semop on the set (from IPC_RMID section, SEMCTL(2) man page).
So this is effectively the same as removing a semaphore with ipcrm command.
The file permissions should be 0666 instead of 6000 for what you're trying to do.
I’m trying to insert a large amount of data (30 000+ lines) in a MySQL database using Doctrine2 and the Symfony2 fixture bundle.
I looked at the right way to do it. I saw lots of questions about memory leaks and Doctrine, but no satisfying answer for me. It often comes the Doctrine clear() function.
So, I did various shapes of this:
while (($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
if ($iteration % 500 == 0) {
$manager->flush();
$manager->clear();
// Also tried some sort of:
// $manager->clear($obj);
// $manager->detach($obj);
// gc_collect_cycles();
}
}
PHP memory still goes wild, right after the flush() (I’m sure of that). In fact, every time the entities are flushed, memory goes up for a certain amount depending on batch size and the entities, until it reaches the deadly Allowed Memory size exhausted error. With a very very tiny entity, it works but memory consumption increase too much: several MB whereas it should be KB.
clear(), detach() or calling GC doesn’t seem to have an effect at all. It only clears some KB.
Is my approach flawed? Did I miss something, somewhere? Is it a bug?
More info:
Without flush() memory barely moves;
Lowering the batch do not change the outcome;
Data comes from a CSV that need to be sanitized;
EDIT (partial solution):
#qooplmao brought a solution that significantly decrease memory consumption, disable doctrine sql logger: $manager->getConnection()->getConfiguration()->setSQLLogger(null);
However, it is still abnormally high and increasing.
I resolved my problem using this resource, as #Axalix suggested.
This is how I modified the code:
// IMPORTANT - Disable the Doctrine SQL Logger
$manager->getConnection()->getConfiguration()->setSQLLogger(null);
// SUGGESION - make getData as a generator (using yield) to to save more memory.
while ($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
// IMPORTANT - Temporary store entities (of course, must be defined first outside of the loop)
$tempObjets[] = $obj;
if ($iteration % 500 == 0) {
$manager->flush();
// IMPORTANT - clean entities
foreach($tempObjets as $tempObject) {
$manager->detach($tempObject);
}
$tempObjets = null;
gc_enable();
gc_collect_cycles();
}
}
// Do not forget the last flush
$manager->flush();
And, last but not least, as I use this script with Symfony data fixtures, adding the --no-debug parameter in the command is also very important. Then memory consumption is stable.
I found out that Doctrine logs all SQLs during execute. I recommend to disable it with code below, it can really save memory:
use Doctrine\ORM\EntityManagerInterface;
public function __construct(EntityManagerInterface $entity_manager)
{
$em_connection = $entity_manager->getConnection();
$em_connection->getConfiguration()->setSQLLogger(null);
}
My suggestion is to drop the Doctrine approach for bulk inserts. I really like Doctrine but I just hate this kind of stuff on bulk inserts.
MySQL has a great thing called LOAD DATA. I would rather use it or even if I have to sanitize my csv first and do the LOAD after.
If you need to change the values, I would read csv to array $csvData = array_map("str_getcsv", file($csv));. Change whatever you need on the array and save it to the line. After that, use the new .csv to LOAD with MySQL.
To support my claims on why I wouldn't use Doctrine for this here described on the top.
I'm having problems with a batch insertion of objects into a database using symfony 1.4 and doctrine 1.2.
My model has a certain kind of object called "Sector", each of which has several objects of type "Cupo" (usually ranging from 50 up to 200000). These objects are pretty small; just a short identifier string and one or two integers. Whenever a group of Sectors are created by the user, I need to automatically add all these instances of "Cupo" to the database. In case anything goes wrong, I'm using a doctrine transaction to roll back everything. The problem is that I can only create around 2000 instances before php runs out of memory. It currently has a 128MB limit, which should be more than enough for handling objects that use less than 100 bytes. I've tried increasing the memory limit up to 512MB, but php still crashes and that doesn't solve the problem. Am I doing the batch insertion correctly or is there a better way?
Here's the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /Users/yo/Sites/grifoo/lib/vendor/symfony/lib/log/sfVarLogger.class.php on line 170
And here's the code:
public function save($conn=null){
$conn=$conn?$conn:Doctrine_Manager::connection();
$conn->beginTransaction();
try {
$evento=$this->object;
foreach($evento->getSectores() as $s){
for($j=0;$j<$s->getCapacity();$j++){
$cupo=new Cupo();
$cupo->setActivo($s->getActivo());
$cupo->setEventoId($s->getEventoId());
$cupo->setNombre($j);
$cupo->setSector($s);
$cupo->save();
}
}
$conn->commit();
return;
}
catch (Exception $e) {
$conn->rollback();
throw $e;
}
Once again, this code works fine for less than 1000 objects, but anything bigger than 1500 fails. Thanks for the help.
Tried doing
$cupo->save();
$cupo->free();
$cupo = null;
(But substituting my code) And I'm still getting memory overflows. Any other ideas, SO?
Update:
I created a new environment in my databases.yml, that looks like:
all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: 'mysql:host=localhost;dbname=.......'
username: .....
password: .....
profiler: false
The profiler: false entry disables doctrine's query logging, that normally keeps a copy of every query you make. It didn't stop the memory leakage, but I was able to get about twice as far through my data importing as I was without it.
Update 2
I added
Doctrine_Manager::connection()->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
before running my queries, and changed
$cupo = null;
to
unset($cupo);
And now my script has been churning away happily. I'm pretty sure it will finish without running out of RAM this time.
Update 3
Yup. That's the winning combo.
I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);
For a symfony task, I also faced to this issue and done following things. It worked for me.
Disable debug mode. Add following before db connection initialize
sfConfig::set('sf_debug', false);
Set auto query object free attribute for db connection
$connection->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
Free all object after use
$object_name->free()
Unset all arrays after use unset($array_name)
Check all doctrine queries used on task. Free all queries after use. $q->free()
(This is a good practice for any time of query using.)
That's all. Hope it may help someone.
Doctrine leaks and there's not much you can do about it. Make sure you use $q->free() whenever applicable to minimize the effect.
Doctrine is not meant for maintenance scripts. The only way to work around this problem is to break you script to parts which will perform part of the task. One way to do that is to add a start parameter to your script and after a certain amount of objects had been processed, the script redirects to itself with a higher start value. This works well for me although it makes writing maintenance scripts more cumbersome.
Try to unset($cupo); after every saving. This should be help. An other thing is to split the script and do some batch processing.
Try to break circular reference which usually cause memory leaks with
$cupo->save();
$cupo->free(); //this call
as described in Doctrine manual.
For me , I've just initialized the task like that:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();
$config = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', true);
sfContext::createInstance($config);
(WITH PROD CONFIG)
and use free() after a save() on doctrine's object
the memory is stable at 25Mo
memory_get_usage=26.884071350098Mo
with php 5.3 on debian squeeze
Periodically close and re-open the connection - not sure why but it seems PDO is retaining references.
What is working for me is calling the free method like this:
$cupo->save();
$cupo->free(true); // free also the related components
unset($cupo);