library interposition with dlsym - php

I'm writing an interposition library to track the usage of some library functions in libc, such as open(), close(), connect(), etc. It works generally well on most of the applications. However, when I try it with PHP, using PHP's MySQL module in particular, none of the function calls to libc inside this module is been tracked (so no connect(), no socket(), etc.). 'strace' told me that the system calls socket(), connect(), etc., took place. Running 'file' on the module and libmysqlclient.so.16.0.0 said that they are all dynamically linked. So it shouldn't be a problem caused by static linkage. What might be the problem?
I'm using Fedora 11 64-bit version.
Thank you.

It seems like that it was not caused by static linkage. In fact, PHP is dynamically linked to other libraries. The problem relies in the way PHP loads extensions.
PHP loads extensions by calling dlopen() with flags RTLD_LAZY, which means that the symbol will only be resolved when the reference is executed. This bypasses the interposition specified by LD_PRELOAD.

It's possible that the library may be invoking system calls directly for some reason. In this case you'd need to use strace (or ptrace() in your own program) to track this usage.

I agree with the answer above that these libraries may be bypassing the calls to open(), write(), etc in libc.. In other words, those libraries may be calling the system calls directly using assembly and not using the libc interface.. although it is not all that common to see applications using the syscalls directly, it is not unheard of..
If that's the case, that's why you would not see any interception in your library interposition experiment.. You have two ways then, the quick one through strace and the more complex one in building a kernel module that will intercept these calls at the kernel level and reporting to whatever framework you are building..
Have fun..
ErnestoB

Related

Does PHP's SyncMutex class named mutexes work across different processes on the same server?

BACKGROUND
I am using Apache and PHP to build a web application and I need to synchronize access to a region of shared memory. Since different instances of PHP have different process ID's, I am wondering if PHP's SyncMutex class with named mutexes can be used for this. Doing a Google search, I see quite a bit of information about using files as mutexes, but not much for the SyncMutex class. Not even the manual has much more information beyond the definition of the class and a few examples on how to use it.
QUESTION
Will a SyncMutex class named mutex in one process be visible in a different process?
RESEARCH
https://helperbyte.com/questions/470561/how-to-prevent-simultaneous-running-of-a-php-script
PHP mutual exclusion (mutex)
PHP rewrite an included file - is this a valid script?
Most reliable & safe method of preventing race conditions in PHP
FINALLY
The data involved is VERY transient and can change on a moment's notice (think volatile keyword in C). The data becomes useless after 30 seconds or so, and is purged periodically. For performance reasons, I'm storing it in the server's shared memory where the access is much faster than writing to a file or a database. Is this a valid use case or am I barking up the wrong tree? Should I use something else? Like a semaphore?
ADDITIONAL INFORMATION (EDIT 8/25/2021)
Upon further research, I have discovered that the pthreads module for PHP has been depreciated by the module owner several months ago and is no longer maintained. PHP is now using something called parallel to facilitate multithreading. Because of the design of the setup that I'm using, parallel is not compatible with what I am trying to do. So it looks like that I will have to use MySQL to handle this after all for right now.
My idea of using a memory server is still viable, but the server needs to be written in a language other than PHP due to the change in PHP's multithreading architecture. That will be done using C++, but not right now.
Thank you to everyone who responded.
According to the user supplied example below the documentation page for SyncMutex::unlock, SyncMutex lock created in one process is visible in another.
Additionally SyncSharedMemory class description says it explicitly:
Shared memory lets two separate processes communicate without the need
for complex pipes or sockets. [...] Synchronization objects (e.g. SyncMutex) are still required to protect most uses of shared memory.

What does 'userland' mean in the PHP manual?

I'm going through the PHP manual and found the word 'userland' a couple of times. What does that usually mean? I found it in this page; I think it's the source code itself but I'm not sure.
From PHP manual:
While executing in a debug environment, configured with --enable-debug, the leak function used in the next example is actually implemented by the engine and is available to call in userland.
Since this question does not have a correct answer yet (despite an answer being selected), I'll go ahead and answer this.
The PHP core development team makes three main distinctions when referring to PHP:
PHP core. This refers to the Zend engine that powers PHP. It does things like tokenize userland code, handle memory management, process built-in keywords (if-else, while, isset, etc), and more. That last bit is why built-ins are many times faster than function calls. What PHP core generally does not do is implement functions like substr(), fopen(), etc., which is left to...
PHP extensions. This refers to the majority of the PHP source code but also PECL extensions and other PHP extensions written in C (and sometimes C++). All of the core functions and classes that are always available with PHP are actually implemented in extensions with the biggest extension being 'ext/standard'.
PHP userland. This refers to code that users of PHP generally write that leverage various PHP extensions and the core.
When you see the phrase "pure PHP userland" usually in reference to a PHP userland library that someone writes, they generally mean without dependencies on anything outside of PHP built-ins, extensions that may not be compiled in or available on a host, or external software not in the PHP ecosystem.
The use of any of these phrases may be an indicator that the person tends to lurk on the PHP internals mailing list. Most PHP developers are userland devs and have little to no knowledge of the inner workings of PHP itself.
It's not a PHP term, but a general computing one:
The term userland (or user space) refers to all code which runs outside the operating system's kernel.
https://en.wikipedia.org/wiki/User_space

Advanced PHP settings - selectively including built-in PHP functions

There are a hell lot of inbuilt PHP functions. I was wondering that after almost 2 and a half years of working as a software engineer I hardly use a little fraction of those. But all of them are defined and can be used with the default PHP installations.
I read somewhere in SO that PHP provides all these inbuilt things but doing similar things with languages like JAVA needs a lot of coding. Is that correct? I am not experienced in other languages much.
Also, am I correct to assume that a large portion of these functions are not used by any of the other inbuilt functions or anything (internal dependencies)? E.g. these functions pdf_fit_table(), gzopen() are needed only in case of PDF and gzip file related things respectively.
If so, then as advanced programmers, does PHP provide any option to us to selectively load them, based on the specific project requirements or more dynamically, based on a specific module? e.g. load PDF related functions only if I have PDF related tasks. If possible, at what level can it be done? If at the PHP installation level, then I think it is not possible in case of shared hosting. Is a better solution to this possible?
I am just speaking from a common sense point of view, we include files containing functions on a need basis.
Is it going to give a performance boost?
I am not much aware of the core libraries etc. of PHP. So, please shed some light.
Updates:
Thanks for the answers
#pygorex1 - The HipHop way is to optimize PHP overall. So, putting in very simple terms, if I am correct, if it was taking 1 second to run before then using HipHop it may make it 0.7 second. But in both the cases, the presence of those extra unnecessary defined functions are adding their overhead (say 0.1 second in first case and 0.07 sec in HipHop case). If so, then HipHop targets something else and does not answer my question. However, the other two points you gave say that all has to be done while compiling. So, it probably means if I compile with an extension then the function groups under that will be loaded every time . Then probably there is no further way of removing the inclusion? Some kind of everride?
#Tyler - I agree that it might be difficult to do what I am asking for but the reason is not what you are saying. It cannot be so difficult to find out the dependencies. Just applying common sense, I can say that functions like is_numeric(), is_array(), array_walk(), func_get_args() etc. are very basic ones and are probably called by many but there are easily distinguishable groups like the socket functions group containing e.g. socket_connect() which need not be included if not explicitly needed. The problem probably is that it needs to be specified while compiling, like pygorex1 has answered.
Concerning any potential performance boost - you're probably not going to notice it unless you're serving a ton of dynamic PHP pages. This road has been traveled before - take a look at HipHop, Facebook's tool to optimize PHP into C++. Utilizing byte code caches like APC and eAccelerator AND/OR rewriting your PHP code to cache intelligently with memcached will improve PHP performance far more than enabling/disabling certain PHP functions.
That having been said, there's two main ways to pare down the number of functions that PHP has available:
PHP compile-time options
Available when compiling PHP from source. One of the functions noted in the question gzopen() is part of the zlib extension and has to be enabled at compile time. There's quite a few built-in compile-time options.
PHP modules
These are loaded dynamically by PHP and are controlled by the php.ini config file under extensions - they are .dll files on Windows or .so files on Linux. A snippet from my development php.ini:
...
extension=php_bz2.dll
;extension=php_curl.dll
;extension=php_dba.dll
;extension=php_dblib.dll
extension=php_mbstring.dll
extension=php_exif.dll
extension=php_fileinfo.dll
extension=php_gd2.dll
...
There is dl() to load a PHP extension at runtime.
Example to load an extension dynamically:
if (!extension_loaded('sqlite')) {
$prefix = (PHP_SHLIB_SUFFIX === 'dll') ? 'php_' : '';
dl($prefix . 'sqlite.' . PHP_SHLIB_SUFFIX);
}
This is taken from http://php.net/manual/en/function.dl.php
The php function namespace debacle, is, well, exactly that.
No, there's no way selectively load them at run-time. Just because you don't call something, doesn't mean something you call doesn't call it.
Dont bother compiling out built-in functions. Learn about shared libraries and linux caching system. Those files(and functions) are basically always loaded and cached so it has very little impact on an application. As pygorex1 said, its better to use a good caching mechanism than crippling the PHP distribution on purpose.
#powtac: doing dl() as a way to dinamically load some libraries might acually slow down your app(depends on how many dl() you do, it might be better to have them always loaded in memory than loading them on request)
#Tyler Eaves: you may disable some function from being called actually. There's nothing preventing their loading though..
Also, hip-hop as far as i knwo actually compiles php code down to C/C++ code, and then compiles it. This has the BIG advantage of skipping the virtual machine, and php-specific upcodes and lots of overhead over a scripted language, but has the big disatvantage that its not a scripted language anymore.

Convert a class to an extension

I have a PHP class I want to convert to a PHP extension. I checked some tutorials (tuxradar's writing extensions, php.net's extending php, and zend's extension writing) and it's a bit complicated.
I found the article "How to write PHP extensions" (ed note: site is defunct) and I wanted to know if it is possible to use this to make it grab a PHP class from a certain path (say /home/website1/public_html/api/class.php), execute it and return the class instance.
This way it will be usable in other websites that are hosted on the same server – each can simply call the function and it will obtain its own instance.
Is that possible?
The question as I understand it now is, The user has a PHP class that they would like to share with multiple people, but does not want to share the source code.
There are many solutions to this, they generally invovle turning the PHP code into some kind of byte code, and using a PHP extension to run the byte code. I've never used any of these solutions, but I'm aware of the following:
phc is an open source compiler for PHP
Zend Guard
HipHop for PHP - I'm unsure about this, but Facebook recently released it so it might be worth a look.
I'm sure there are others. Just Google for PHP Compiler, or PHP Accelerator.
In one sentence: I don't believe so, I think its a lot more work than that.
No, there is not tool that can do that.
Anyway, what you want call be easily accomplished with auto_prepend_file. Just make that ini directive point to a PHP file that has the class definition, and then it will be available to all the applications.
If you don't want the users to be able to use the source, you can use one the several zend extensions that allow you to pre-compile the file and use it in that form.
You can extend underlying C library functions into PHP space by writing PHP extensions. However, i think in your case you don't need to write one.
I am aware that this is an old question (being from 2012) however the answer has changed and there is now a tool that can do this. Jim Thunderbirds PHP-to-C Extension toolset provides the means to take a simple class in one file all the way up to a complicated multi file multi-level namespaced framework and convert it to a C-extension that can then be installed into your PHP server.
While in many use cases doing so is not needed as the ordinary PHP code will work just as good in some cases significant performance improvements can be experienced. The information page shows that an ordinary class (deliberately designed to take a long time) took 16.802139997482 seconds as plain vanilla PHP, and 3.9628620147705 as a PHP extension built using the tool.
As an added advantage the tool also provides an additional feature. The ability to combine PHP code (to be converted to C) and native C code within the same extension which can produce even greater performance enhancements. The same example used above only tool 0.14397192001343 seconds when much of the intensive code was moved to a bubble sort C code and simply calling it from within the PHP code.
As a side note functionally to the end developers using the code using the extension is very much similar to having the files manually included in the PHP file being developed except it doesn't have to be specifically included as it is done through the PHP extensions component.
(Disclaimer: I am not affiliated with this developer but am glad to have come across it as it is thus far working for converting some of my intensive classes into PHP extensions without needing to know C).

PHP - Extension vs. Library vs. Class - when and why

I'm trying to accomplish a task and turns out that the code I need is packaged as a PHP extension, which according to what I've been told means I have to have root access to install it (I'm on shared hosting so that's a bit of a problem.
I'll solve this problem later, but for now I'm trying to understand the difference between an extension, a library, and a class. Is it more of a packaging thing that could be overridden and repackaged a different way, or is there a valid architectural reasoning behind it?
Also when releasing your own code, what makes you decide to release as library vs. class vs. extension? or do you go with whichever sounds better?
thanks in advance.
P.S. If you must know which extension I'm talking about, it's Libpuzzle, but that's really beside the point, my question is more general.
An extension is a pice of code programmed in C which will be included into the PHP core when PHP starts. Normally you have some more native functions available after including a extension. For example a zip functionality.
A class is a abstract pice of PHP code which solves common tasks. For example sending emails. You can find some common classes at pear.php.net.
A library is a collection of PHP classes wich solve more generic tasks for example buliding HTML forms AND sending emails. The Zend Framework is a framework which consists of many, many PHP classes.
Normally extension features can be programmed in PHP. For example the PEAR::Compat class. Often you will find the functionality you need as a PHP class available. I'm sure the stackoverflow readers will supply you with ideas where to find a specific PHP class.
Extensions are low-level. Usually written in C/C++, and compiled into native-code shared libraries, they interact with the Zend Engine directly. It has pros and cons, main advantages being the speed and more control; and main disadvantages - they are harder to install, and require compilation (and that requires a compiler and PHP headers); it's not true they require root access though - you only need ability to use custom php.ini (or dl() function, but I see they deprecated it for some reason).
Libraries/classes are high-level and interpreted. If you don't know if you need to write extension, then you probably don't. About what classes are - read about OOP. A library is a reusable collection of code (most commonly in form of functions/classes).
Some libraries (including libpuzzle) also include a command-line tool. So if you're unable to use the PHP library due to your shared hosting environment, maybe you can compile the command-line tool. Then you can run it from PHP using something like exec. It will be slower and require more memory than a library, but it might get the job done. Of course, many hosts also have restrictions on commands like exec, so this might not work either.

Categories