Using cURL Handle as Array Key - php

I'm using curl_multi functions to request multiple URLs and process them as they complete. As one connection completes all I really have is the cURL handle (and associated data) from curl_multi_info_read().
The URLs come from a job queue, and once processed I need to remove the job from the queue. I don't want to rely on the URL to identify the job (there shouldn't be duplicate URLs, but what if there is).
The solution I've worked up so far is to use the cURL handle as an array key pointing to the jobid. Form what I can tell, when treated as a string the handle is something like:
"Resource id #1"
That seams reasonably unique to me. The basic code is:
$ch = curl_init($job->getUrl());
$handles[$ch] = $job;
//then later
$done = curl_multi_info_read($master);
$handles[$done['handle']]->delete();
curl_multi_remove_handle($master, $done['handle']);
Is the cURL handle safe to use in this way?
Or is there a better way to map the cURL handles to the job that created them?

Store private data inside the cURL easy handle, e.g. some job ID:
curl_setopt($ch, CURLOPT_PRIVATE, $job->getId());
// then later
$id = curl_getinfo($done['handle'], CURLINFO_PRIVATE);
This "private data" feature is not (yet) documented in the PHP manual. It was introduced already in PHP 5.2.4. It allows you to store and retrieve a string of your choice inside the cURL handle. Use it for a key that uniquely identifies the job.
Edit: Feature is now documented in the PHP manual (search for CURLOPT_PRIVATE within the page).

It will probably work thanks to some implicit type cast, but it doesn't feel right to me at all. I think it's begging for trouble somewhere down the line, with future versions that treat resources differently, different platforms...
I personally wouldn't do it, but use numeric indexes.

I have to agree with Pekka... it will probably work but it smells bad. id use straight up integers as Pekka suggests or wrap the handles in a simple class and then use spl_object_hash or have the constructor generate a uniqid when its set up.

Related

array_push overwritting array data [duplicate]

Is there a way in PHP to use "out of session" variables, which would not be loaded/unloaded at every connexion, like in a Java server ?
Please excuse me for the lack of accuracy, I don't figure out how to write it in a proper way.
The main idea would be to have something like this :
<?php
...
// $variablesAlreadyLoaded is kind of "static" and shared between all PHP threads
// No need to initialize/load/instantiate it.
$myVar = $variablesAlreadyLoaded['aConstantValueForEveryone'];
...
?>
I already did things like this using shmop and other weird things, but if there is a "clean" way to do this in "pure PHP" without using caching systems (I think about APC, Redis...), nor database.
EDIT 1 :
Since people (thanks to them having spent time for me) are answering me the same way with sessions, I add a constraint I missed to write : no sessions please.
EDIT 2 :
It seems the only PHP native methods to do such a thing are shared memory (shmop) and named pipes. I would use a managed manner to access shared objects, with no mind of memory management (shared memory block size) nor system problems (pipes).
Then, I browsed the net for a PHP module/library which provides functions/methods to do that : I found nothing.
EDIT 3 :
After a few researches on the way pointed out by #KFO, it appears that the putenv / setenv are not made to deal with objects (and I would avoid serialization). Thus, it resolves the problem for short "things" such as strings or numbers but not for more large/comples objects.
Using the "env way" AND another method to deal with bigger objects would be uncoherent and add complexity to the code and maintenability.
EDIT 4 :
Found this : DBus (GREE Lab DBus), but I'm not having tools to test it at work. Has somebody tested it yet ?
I'm open to every suggestion.
Thanks
EDIT 5 ("ANSWER"):
Since DBus is not exactly what I'm looking for (needs to install a third-party module, with no "serious" application evidence), I'm now using Memcache which has already proven its reliability (following #PeterM comment, see below).
// First page
session_id('same_session_id_for_all');
session_start();
$_SESSION['aConstantValueForEveryone'] = 'My Content';
// Second page
session_id('same_session_id_for_all');
session_start();
echo $_SESSION['aConstantValueForEveryone'];
This works out of the box in PHP. Using the same session id (instead of an random user-uniqe string) to initialize the session for all visitors leads to a session which is the same for all users.
Is it really necessary to use session to achieve the goal or wouldn't it better to use constants?
There is no pure PHP way of sharing information across different
threads in PHP! Except for an "external"
file/database/servervariable/sessionfile solution.
Since some commentators pointed out, that there is serialize/unserialize functionality for Session data which might break data on the transport, there is a solution: In PHP the serialize and unserialize functionality serialize_handler can be configured as needed. See https://www.php.net/manual/session.configuration.php#ini.session.serialize-handler It might be also interesting to have a look at the magic class methods __sleep() and __wakeup() they define how a object behaves on a serialize or unserialize request. https://www.php.net/manual/language.oop5.magic.php#object.sleep ... Since PHP 5.1 there is also a predefined Serializable interface available: https://www.php.net/manual/class.serializable.php
You can declare a Variable in your .htaccess. For Example SetEnv APPLICATION_ENVIRONMENT production and access it in your application with the function getenv('APPLICATION_ENVIRONMENT')
Another solution is to wrap your variable in a "persistent data" class that will automatically restore its data content every time the php script is run.
Your class needs to to the following:
store content of variable into file in __destructor
load content of variable from file in __constructor
I prefer storing the file in JSON format so the content can be easily examined for debugging, but that is optional.
Be aware that some webservers will change the current working directory in the destructor, so you need to work with an absolute path.
I think you can use $_SESSION['aConstantValueForEveryone'] that you can read it on every page on same domain.
Consider to refer to it's manual.

Achieving multithreading in PHP

I am writing a kind of test system in php that would test my database records. I have separated php files for every test case. One (master) file is given the test number and the input parameters for that test in the form of URL string. That file determines the test number and calls the appropriate test case based on test number. Now I have a bunch of URL strings to be passed, I want those to be passsed to that (master) file and every test case starts working independently after receiving its parameters.
PHP is a single threaded entity, no multithreading currently exists for it. However, there are a few things you can do to achieve similar (but not identical) results for use cases I have come across when people normally ask me about multithreading. Again, there is no multithreading in PHP, but some of the below may help you further in creating something with characteristics that may match your requirement.
libevent: you could use this to create an event loop for PHP which would make blocking less of an issue. See http://www.php.net/manual/en/ref.libevent.php
curl_multi: Another useful library that can fire off get/post to other services.
Process Control: Not used this myself, but may be of value if process control is one aspect of your issue. http://uk.php.net/pcntl
Gearman: Now this I've used and it's pretty good. It allows you to create workers and spin off processes into a queue. You may also want to look at rabbit-php or ZeroMQ.
PHP is not multithreaded, it's singlethreaded. You cannot start new threads within PHP. Your best bet would be a file_get_contents (or cURL) to another PHP script to "mimic" threads. True multithreading isn't available in PHP.
You could also have a look at John's post at http://phplens.com/phpeverywhere/?q=node/view/254.
What you can do is use cURL to send the requests back to the server. The request will be handled and the results will be returned.
An example would be:
$c = curl_init("http://servername/".$script_name.$params);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($c);
curl_close($c);
Although this is not considered multithreading, it can be used to achieve your goal.

Get unique worker/thread/process/request ID in PHP

In multi-threaded environments (like most web platforms) I often include some sort of thread ID to the logs of my apps. This enables me to tell exactly what log entry came from which request/thread, when there are multiple requests at once which are simultaneously writing to the same log.
In .NET/C#, this can be done by the formatters of log4net, which by default include the current thread's ManagedThreadId (a number) or Name (a given name). These properties uniquely identify a thread (see for example: How to log correct context with Threadpool threads using log4net?
In PHP, I have not found anything similar (I asked Google, PHP docs and SO). Does it exist?
Up until recently, I used apache_getenv("UNIQUE_ID"), and it worked perfectly with a crc32 or another hash function.
Nowadays I'm just using the following, in order to remove dependency on Apache and this mod.
$uniqueid = sprintf("%08x", abs(crc32($_SERVER['REMOTE_ADDR'] . $_SERVER['REQUEST_TIME'] . $_SERVER['REMOTE_PORT'])));
It's unique enough to understand which logs belong to which request. If you need more precision, you can use other hash functions.
Hope this helps.
zend_thread_id():
int zend_thread_id ( void )
This function returns a unique identifier for the current thread.
Although:
This function is only available if PHP has been built with ZTS (Zend Thread Safety) support and debug mode (--enable-debug).
You could also try yo call mysql_thread_id(), when you use that API for your database access (or mysqli::$thread_id when using mysqli).
PHP does not seem to have a function for this available, but your web server might be able to pass the identifier via environment variables. There is for example an Apache module called "mod_unique_id"[1] which generates a unique identifier for each request and stores it as an environment variables. If the variable is present, it should be visible via $_SERVER['unique_id'] [2]
"Pure PHP" solution could be to write a script that generates suitable random identifier, stores it via define("unique_id", val) and then use auto_prepend_file [3] option in php.ini to include this in every script that executes. This way the unique id would be created when the request starts processing and it would be available during the processing of the request.
[1] http://httpd.apache.org/docs/current/mod/mod_unique_id.html
[2] http://forums.devshed.com/php-development-5/server-unique-id-questions-163269.html
[3] http://www.php.net/manual/en/ini.core.php#ini.auto-prepend-file
I've seen getmypid() used for this purpose, but it seems to behave differently on different systems. In some cases the ID is unique to each request, but on others it's shared.
So, you're probably better of going with one of the other answers to ensure portability.
Assigning an ID in order to identify logged data from serving a request probably is as simple as creating a UUID version 4 (random) and writing it to every line of the log.
There even is software helping with that: ramsey/uuid, php-middleware/request-id
Adding it to every line of logging is easy when using log4php by putting the UUID to the LoggerMDC data and using an appropriate LogFormatter. With PSR-3 loggers, it might be a bit more complicated, YMMV.
A randomly created UUID will be suitable to identify one single request, and by using that UUID in the HTTP headers of sub requests and in the response, it will even be possible to trace one request across multiple systems and platforms inside the server farm. However, putting it as a header is not the task of any of the packages I mentioned.

PHP invoking another script but through http (isolating them)

Let's see if I make myself clear. I have an old set of scripts that run well on PHP4 and better don't thouch em. I have to integrate a new functionality implemented on PHP5, I need just to invoke a script on the new app from the old one.
To not have to touch the old stuff I think to somehow "kin of remotely" invoke the new one, need only to pass the $_REQUEST[] data. I can not include it as that would require migrating to another PHP version (and some name clashing). I don't need any output from the new one.
What would be the cleaner way to "call" that script passing parameters, fopen("http://theserver.com/thescript.php"....) and then passing all the necessary headers to pass the parameters? or there's somethign more direct?
Thanks!
If you need to pass POST data, you can use cURL; otherwise, you can just do file_get_contents('http://example.com/yourscript.php?param1=x&param2=y&param3=...'); and the HTTP wrapper will do the request for you (simplest way).
You're going to give yourself nightmares with this.
But if you really need to do it, you're not going to be able to rely on fopen. I would recommend using cURL, as Piskvor suggests.
But please, make sure you're validating and escaping any data you're pushing across correctly, or you're in for a world of hurt - the fact that you're making a cURL request to the other part of the system means that in theory, anyone else can do exactly the same thing.
This is most definitely not a long term solution, I would advise you rewrite the old parts as a priority.
After considering what you suggested on previous answers and considering safety I thought something: If both scripts are on the same server the "called" one should be on the same IP than the caller so if ips differ the invoked should not run. Is that a good idea?

Should my framework allow access to $_GET and $_POST at the same time?

I know you can use both $_GET and $_POST at the same time, but is this a required "feature"? I am writing a framework, where you can access input through:
$value = $this->input->get('name','');
$value = $this->input->post('name','');
$value = $this->input->cookies('name','');
I'm just thinking here, is there a need for having GET and POST at the same time? Couldn't I just do:
$value = $this->input('name','default value if not set');
To obtain GET/POST data according to which HTTP request was made? Cookies will be only accessible through ->cookies(), but should I use ->get() and ->post() instead of doing something like ->input() ?
Thanks for your input!
It's conceivable that in a REST architecture I'd add a product like so:
POST /products?location=Ottawa HTTP/1.0
name=Book
And the product would automatically be associated with the location in the query params.
In a nutshell: there are semantically valid reasons for allowing both, but they can always be transformed into one or the other. That being said, do you want to enforce that usage on your users?
Yes, but you might want to make sure that when using this code you check that the request method is POST if you are going to change anything as a consequence of the request, rather than treating GET and POST as the same thing.
This is because generally GET requests should not have any side effects, all they should do is 'get' stuff.
Edit
This seems less relevant since you have clarified your question, but I will leave it here anyway
Yes!
I think you must allow access to both $_GET and $_POST at the same time. And I don't think you can just merge them together either. (You can have the option to, like PHP and the ill concieved $_REQUEST.) You could get a request like:
POST /validator?type=strict HTTP/1.1
type=html/text
body=<h1>Hello World</h1>
Note that the variable name type is used twice, but in different scopes! (Once in the URI defining the resource that should handle the POST, and then in the posted entity itself.) In PHP this looks like:
$_GET => ('type' => 'strict')
$_POST => ('type' => 'html/text', 'body' => '<H1>Hellow World</h1>')
PHP:s way of just parsing the URI and putting the parameters there into $_GET is somewhat confusing. A URI is used with most (all?) of the HTTP methods, like POST, GET, PUT, DELETE etc. (Not just GET, like PHP would have you believe.) Maybe you could be revolutionary and use some of your own lingo:
$a = $this->uri('name');//param passed in the URI (same as PHP:s $_GET)
$b = $this->entity('body');//var passed in an entity (same as PHP:s $_POST)
$c = $this->method(); //The HTTP method in question ('GET', 'POST' etc.)
And maybe even some utility functions:
if($this->isGET()){
...
}elseif($this->isPOST()){
...
)
I know, wild and crazy :)
Good luck and have fun!
cheers!
you could use just the input method but with flags incase the user wants input from a specific var:
$this->input('abc', '');
$this->input('abc', '', self::I_POST);
$this->input('abc', '', self::I_GET);
$this->input('abc', '', self::I_COOKIE);
It's generally considered better to use $_GET and $_POST rather than $_REQUEST because it costs you nothing much and it closes off some small set of manipulations of the web site. I'd make the specific-source retrievals at least available in your framework.
I would suggest keeping them separate, since they are used for separate purposes. GET is generally used for display purposes, while POST is used for admin purposes, adding/editing items, confirming choices, etc.
There may also be a slight security problem: someone could like to a page using GET parameters and force execution of something like deleting data - e.g. example.com/index.php?deleteid=123 (Actually this can be done with POST from an external HTML form but is much less common. Anyone can post a link on a forum, blog, anywhere.)
I would recommend keeping both POST and GET vars, since you can't predict how they are going to be used.
Most importantly be sure to validate against security exploits such as XSS, Sql injection in $_[POST|GET] before populating your objects.
I would say it highly depends on the situation. If you simply want to accept some parameters that will change how you display an HTML page (a typical GET variable) it's probably ok to accept both.
If you are going to work with forms, changing data and restricted access; you should look into the domain of CSRF and how this security issue might affect you.
In general, if you can be explicit about either, it's wise to do so.

Categories