Extract data from different formatted data dumps - php

For my current project I have to parse data "dumps" (EEPROM HEX dumps from a microcontroller) (coming from a file or a SQL database) that have a different format depending on the version of the software in the controller. (The software version is also in the dump.)
Extracting the version works and I have a somewhat working version that works for 1 specific version (but I find the current version quite messy). Since there is some overlap the between the different versions.
The output of the code is a JSON array that is fed to angular that formats the data into a table so a user can start playing with the data. (the JSON is generated with json_encode, works great)
What I'm looking for is a nice (oop, if possible) solution what can extract the data from the different file versions without a lot of code copy-paste. The only solution I can think of right now would be class for the 1st version, and than do a copy-paste (and a little edit) to a new class the 2nd version (and so on).
A generic answer on how to do this is fine for me, however I'm trying to do this in PHP.

Start by making a base class and a child class for the first version (the following code may not have perfect syntax):
class DataDump {
// some kind of storage
protected $contents;
}
class FirstVersion extends DataDump {
}
Then as you are building the FirstVersion class, try to pay close attention to what you think might be common in all the versions and move that code down to the DataDump parent class.
class DataDump {
// some kind of storage
protected $contents;
public loadFile($filename) {
....code....
$this->contents = ...
}
public processHeader() {
}
}
class FirstVersion extends DataDump {
public loadFirstFile {
$this->loadFile(...);
$this->processHeader();
...some code specific to this version...
}
}
Then you can slowly build the parent class with the things that are common, and there is almost always parts of the code that are common (even if in small segments). As you build other child classes, you may find other logic that can be shared, so push it down to the parent, or perhaps there is a base method that some of the child classes override and others can just use the base method.
Sometimes, when you have a big mess to deal with, it is hard to see all the shared code upfront and how to best organize it in a sane way, and this process helps tremendously when you are stuck. Just start writing and you will begin to see the patterns among all the versions, even if just helper methods.

Related

PHP Debugging Classes - Print everything

I am trying to figure out a big PHP Library and it isn't that well documented. I would like to know if there is a way to print out everything about the class. For example I am using the get_class_methods() function to print out the methods of that class and it prints out an area of just that class. I would like to see all the methods within the objects within that class as well. It would be nice to see the variables and everything else. This way I can print out everything and then use the browser's search to find stuff that I need. Is this possible, or is there a method out there that already does this? I'm not that well versed in PHP so if you can give me a function that would be awesome.
PHP has a built in library called Reflection that allows you to analyze classes and objects in intricate detail.
You can get all the methods in a class like so:
<?php
$class = new ReflectionClass('Apple');
$methods = $class->getMethods();
var_dump($methods);
For class properties (member variables):
<?php
$class = new ReflectionClass('Apple');
$properties = $class->getProperties();
var_dump($properties);
A good IDE will help you navigate through the code, start debugging sessions and inspect properties values in execution time.
I felt in love with PHPStorm some years ago, and still today it's my favourite IDE. It even has a Vim plugin which emulates vim =)
There is a Structure view on that IDE, which shows the, well :), code structure of a file. This means every property and method of the opened class file. There is a Project view which is like a directory browser too.
Second recommendation will be to install ack (http://www.beyondgrep.com). PHPStorm has a really efficient searching mechanism, but sometimes you just want to search the entire subdirectories of a project for a regular expression. It's a neat tool also.
My two cents. :)

Creating Composer Package

I'm trying to create a composer package & i understand the basic workflow like of creating composer.json, auto loading and creating classes under src directory.
There is one small programming misunderstanding i have is that almost all the other packages i'm reading has interfaces and a class implementing them. I don't understand need of interfaces in this context and why we need them. I have never used interface or i'm not sure if i understand its general use case. It would be nice if someone can help me understand it.
Beside the other question i had in context to composer is how do i test / run a composer project whilst i create it?
Beside this projects that i'm referring has a command directory inside src i don't understand significance of this or its use case too. I guess it has something to do with symfony php console command.
Also there is a bin directory at source, now how is that useful.
Sorry if i'm being naive here but i'm just trying to understand which components fall where and why is it like that. I couldn't find a composer tutorial online past creating composer.json
You are asking a lot of questions at once, but I'll try to at least address interfaces, since I believe that's the most important one.
Interfaces are mostly used with Dependency Injection. They define methods without actually caring how the methods are actually implemented. A class may depend on an interface instead of an actual (concrete) class, which allows an easy way to swap components. Below is an example of how one might use interfaces.
interface PostsInterface {
public function getPosts();
}
class JsonPostFetcher implements PostsInterface {
public function getPosts() {
// Load posts from JSON files here
}
}
class MySqlPostFetcher implement PostsInterface {
public function getPosts {
// Load posts from a MySQL database
}
}
class Blog {
public function __construct(PostsInterface $fetcher) {
// Load posts from either JSON or a database
// depending on which fetcher is provided
$posts = $fetcher->getPosts();
}
}
Using this method anyone can now write their own code to provide posts from an external API ApiPostFetcher, SQLite Database SqlitePostFetcher, Serialized PHP files SerializedPostFetcher etc. One could even write a DymmyPostFetcher that simply returns a pretermined array of posts that could be used for testing purposes. You can then use any implementation of PostsInterface in your blog like in the following example.
$fetcher = new JsonPostFetcher(); // You can provide different fetchers here.
$blog = new Blog($fetcher);
If you're unfamiliar with dependency injection, I highly recommend learning it, since it will be especially useful in writing modular code.

new to oop in PHP a plea for clarifying concepts

I love the PHP language, Ive got SOME coding experience but I am fairly new to PHP although I have been learning a lot I feel I am now stuck / held back by not getting the hang of OOP concepts, although I have been browsing through multiple tutorials.
This is not so much a question about the code itself but rather behind the logic of it
Consider this tutorial I worked from
class person {
var $name;
function set_name($new_name) {
$this->name = $new_name;
}
function get_name() {
return $this->name
}
}
$stefan = new person();
$jimmy = new person;
$stefan->set_name("Stefan Mischook");
$jimmy->set_name("Nick Waddles");
echo "Stefan's full name: " . $stefan->get_name();
echo "Nick's full name: " . $jimmy->get_name(); ?>
I understand what is going on above and I understand the concept but I cant see the benefit of it, I just feel I could have created the above in a much simpler way by simply doing
function person($name){
return $name;
}
echo person("Tim Jones");
Why the getter and the setter function, why not just one function in class person?
Why go through all that code above when my function above is so much shorter?
Where is the benefit here?
I'm basically just looking for someone to give me a bit of clarification on the whole OOP concept which I cant seem to get by the many repetitive tutorials I have been reading.
A good case for using classes is when you may want to expand or rewrite functionality later on.
A few years back I wrote an application in PHP which was used to create issues within Jira. At this point, I was using SOAP to perform the actual issue creation and updates using the published API
Later on, within the same application, I needed to use features within Jira which were associated with the Atlassian Greenhopper/Agile extension. This extension used a REST API and it became apparent that Atlassian was moving all their APIs across to using REST
As I had used a class for the calls to Jira, all the actual grunt work in the background was abstracted. I knew which data was expected and what data I would be expecting
In the end, I wrote a new class to use REST (via cURL calls), and re-wrote the class which accessed Jira. I didn't have to go through the whole application and check each call to a Jira function as the data in and data out was the same.
In reality, the classes I wrote all descended from the REST class:
REST -> JIRA_BASE -> JIRA
The JIRA_BASE class contained methods which were common across all Jira projects (get project names, get user id's etc). The JIRA class itself contained a couple of functions (createJiraIssue and updateJiraIssue) which were particular to each Jira project
The other advantage of using classes and objects is that you can put place-holders in for functions. In the REST class, trying to use a DELETE method (rather than GET, POST or PUT) for a REST call would error straight away (I hadn't written it as I didn't need it). However, I have re-used the same class in another application where I did need to use the DELETE method so that is now written
It also became apparent that the new application needed a change in one aspect of functionality. This was implemented without re-writing any of the calling code
Getters and setters are used to ensure that data is accessed in a controlled manner. If you just use a variable within a script, any part of that script could alter the data. If that data is stored within a class and set to private, then only a call to the getter or setter can alter or retrieve that data.
Additionally, getters and setters can alter the way the data is actually stored but still present it in a usable format. For example, when performing a call to
$obj -> setName('DaveyBoy');
that data could be reversed, escaped, encrypted, stored in session variables, sent to a database and rot13'ed (in no particular order). But, a call to
$name = $obj -> getName()
would still store 'DaveyBoy' in $name without any intervening steps.
I've rambled about classes and why I use them but, hopefuly, this helps to explain a bit.
The advantages of OOP is that anything acting upon a class or object does not need to know how that class or object works under the hood, and much more complicated things can be accomplished in the background with the bulk of your application being un-involved, making your code much more readable.
Consider the following [partially]pseudocode web app example:
$users = array();
$users[] = new User('joe', ADMIN, ACTIVE);
$users[] = new User('jane', ADMIN, ACTIVE);
$users[] = new User('bill', USER, INACTIVE);
class User {
public $Name;
public $Security;
public $Active;
public function __construct($name, $security = USER, $active = INACTIVE) {
$this->Name = $name;
$this->Security = $security;
$this->Active = $active;
}
public function ToggleActive() {
//Not actual working code ahead
$this->Active = ($this->Active) ? INACTIVE : ACTIVE;
$sql->query('UPDATE users SET active=$this->Active WHERE name=$this->Name');
}
public function SetSecurity($security) {
//More non-functional examples
$this->Security = $security;
$sql->query('UPDATE users SET security=$this->Security WHERE name=$this->Name');
}
}
?>
<html>
<form>
<!-- this is, of course, the wrong markup, but the concept is there-->
foreach($users as $user) {
<name>$user->Name</name>
<security>$user->Security <button $user->SetSecurity(ADMIN)>Set Admin</button> <button $user->SetSecurity(User)>Set User</button>
<active>$user->Active <button $user->ToggleActive>Toggle Active</button>
}
<!-- you could even have the class itself output the form html with something like $user->DrawEntryHTML();-->
</form>
</html>
Obviously, there is a lot more that goes in to the web app interface of such an operation (AJAX, function handlers, etc.), but the basics are there, and only the user object itself needs to know how to perform the operation. The rest of your app can simply say Hey, user. You're active now.
OOP gives you an abstract but meaningful way of accomplishing what you want your application components to do. In most applications these days, when a user interacts, or a task happens, a number of things need to happen in order to store, display, and modify its elements. You also gain the advantage of only needing to change a small bit of code in your class in case of a change, update, or feature addition, rather than chasing all over the rest of your code for everything that relates to your users (in this case).
You'll find that a well-written OOP application has a very short program loop (or index.php), and the bulk of the work happens within its class objects.
Consider what happens when you want to change the behaviour of the code - suppose it's been running for a while and you discover that you have lots of duplicate records where people have used variants in the whitespace and capitals. You might then want to amend the code to....
function set_name($new_name) {
$new_name=trim(strtoupper($new_name));
$new_name=str_replace(' ',' ', $new_name);
$new_name=str_replace(' ',' ', $new_name);
$name_parts=explode(' ', $new_name);
$this->surname=array_pop($name_parts);
$this->forenames=implode(' ', $name_parts);
}
function get_name()
{
return $this->forenames . ' ' . $this->surname;
}
But you don't need to change any of the code which interacts with the object.
Now think about a class which describes organizations rather than individuals - they have names - but not forenames and surnames. If you have a class with the same interface (get_name, set_name) then you can throw a mixed bundle of person and organizations at your Christmas card printer application without having to amend the app to cope with the different data types.
(tutorial examples are kept very simple for a reason, unfortunately a lot of the benefits of OO only become apparent when dealing with complex problems - stick with it and you'll get there)
Notice that my answer focus not on the major strengths of OOP as you have already read on those and - as they currently don't apply to your situation - were meaningless. I'm going to focus on what OOP can do to you right now.
The example you give is actually a good one.
Say that you also wanted to attach height, weight, birthday and profession to a person.
Not using objects you could create an array of arrays (to store multiple persons with multiple attributes) but 2 hours into the coding you would try to access the persons job as:
echo $persons[0]['job'];
And it would fail as that field is actually named 'profession'; using objects not only will your IDE know the getters it will also help you.
The major strength of OOP is only really seen when you work in teams or expose code to be used by others however the example I gave should suffice to understand why even as a single developer there are benefits.
That being said, for over simplistic actions it can be overkill.

Including child class requires parent class included first

I have asked a similar question to this one already but I think it was badly worded and confusing so hopefully I can make it a bit clearer.
I am programming in a native Linux file system.
I have a class of HelpTopic:
class HelpTopic extends Help{}
And a class of Help:
class Help{}
Now I go to include HelpTopic:
include('HelpTopic.php');
And even though I do not instantiate HelpTopic with new HelpTopic() PHP (in a Linux file system) still reads the class signature and tries to load Help with HelpTopic.
I do not get this behaviour from a cifs file system shared from a Windows System.
My best guess is that there is some oddity with Linux that causes PHP to react this way but not sure what.
Does anyone have any ideas or solutions to this problem?
EDIT:
I have added my loading function to show what I am doing:
public static function import($cName, $cPath = null){
if(substr($cName, -2) == "/*"){
$d_name = ROOT.'/'.substr($cName, 0, -2);
$d_files = getDirectoryFileList($d_name, array("\.php")); // Currently only accepts .php
foreach($d_files as $file){
glue::import(substr($file, 0, strrpos($file, '.')), substr($cName, 0, -2).'/'.$file);
}
}else{
if(!$cPath) $cPath = self::$_classMapper[$cName];
if(!isset(self::$_classLoaded[$cName])){
self::$_classLoaded[$cName] = true;
if($cPath[0] == "/" || preg_match("/^application/i", $cPath) > 0 || preg_match("/^glue/i", $cPath) > 0){
return include ROOT.'/'.$cPath;
}else{
return include $cPath;
}
}
return true;
}
}
I call this by doing glue::inmport('application/models/*'); and it goes through including all the models in my app. Thing is PHP on a linux based file system (not on cifs) is trying to load the parents of my classes without instantiation.
This is a pretty base function that exists in most frameworks (in fact most of this code is based off of yiis version) so I am confused why others have not run into this problem.
And even though I do not instantiate HelpTopic with new HelpTopic() PHP still reads the class signature and tries to load Help with HelpTopic.
Correct.
In order to know how to properly define a class, PHP needs to resolve any parent classes (all the way up) and any interfaces. This is done when the class is defined, not when the class is used.
You should probably review the PHP documentation on inheritance, which includes a note explaining this behavior:
Unless autoloading is used, then classes must be defined before they are used. If a class extends another, then the parent class must be declared before the child class structure. This rule applies to class that inherit other classes and interfaces.
There are two ways to resolve this problem.
First, add a require_once at the top of the file that defines the child class that includes the file defining the parent class. This is the most simple and straight-forward way, unless you have an autoloader.
The second way is to defione an autoloader. This is also covered in the documentation.
The ... thing ... you're using there is not an autoloader. In fact, it's a horrible abomination that you should purge from your codebase. It's a performance sap and you should not be using it. It also happens to be the thing at fault.
We don't have the definition of getDirectoryFileList() here, so I'll assume it uses either glob() or a DirectoryIterator. This is the source of your problem. You're getting the file list in an undefined order. Or, rather, in whatever order the underlying filesystem wants to give to you. On one machine, the filesystem is probably giving you Help.php before HelpTopic.php, while on the other machine, HelpTopic.php is seen first.
At first glance, you might think this is fixable with a simple sort, but it's not. What happens if you create a Zebra class, and then later need to create an AlbinoZebra that inherits from it? No amount of directory sorting is going to satisfy both the "load ASCIIbetical" and the "I need the Zebra to be first" requirements.
Let's also touch on the performance aspect of the problem. On every single request, you're opening a directory and reading the list of files. That's one hell of a lot of stat calls. This is slow. Very slow. Then, one by one, regardless of whether or not you'll need them, you're including the files. This means that PHP has to compile and interpret every single one of them. If you aren't using a bytecode cache, this is going to utterly destroy performance if the number of files there ever grows to a non-trivial number.
A properly constructed autoloader will entirely mitigate this problem. Autoloaders run on demand, meaning that they'll never attempt to include a file before it's actually needed. Good-performing autoloaders will know where the class file lives based on the name alone. In modern PHP, it's accepted practice to name your classes such that they'll be found easily by an autoloader, using either namespaces or underscores -- or both -- to map directory separators. (Meaning namespace \Models; class Help or class Models_Help would live in Models/Help.php)
Unfortunately most examples won't be useful here, as I don't know what kind of weird things your custom framework does. Take a peek at the Zend Framework autoloader, which uses prefix registration to point class prefixes (Model_) at directories.

Strategy for developing namespaced and non-namespaced versions of same PHP code

I'm maintaining library written for PHP 5.2 and I'd like to create PHP 5.3-namespaced version of it. However, I'd also keep non-namespaced version up to date until PHP 5.3 becomes so old, that even Debian stable ships it ;)
I've got rather clean code, about 80 classes following Project_Directory_Filename naming scheme (I'd change them to \Project\Directory\Filename of course) and only few functions and constants (also prefixed with project name).
Question is: what's the best way to develop namespaced and non-namespaced versions in parallel?
Should I just create fork in repository and keep merging changes between branches? Are there cases where backslash-sprinkled code becomes hard to merge?
Should I write script that converts 5.2 version to 5.3 or vice-versa? Should I use PHP tokenizer? sed? C preprocessor?
Is there a better way to use namespaces where available and keep backwards compatibility with older PHP?
Update: Decided against use of namespaces after all.
I don't think preprocessing the 5.3 code this is a great idea. If your code is functionally identical in both PHP 5.2 and 5.3 with the exception of using namespaces, instead of underscore-separated prefixes, why use namespaces at all? In that case it sounds to me like you want to use namespaces, for the sake of using namespaces..
I do think you'll find that as you migrate to namespaces, you will start to 'think a bit differently' about organizing your code.
For this reason, I strongly agree with your first solution. Create a fork and do backports of features and bugfixes.
Good luck!
This is a followup to my previous answer:
The namespace simulation code got quite stable. I already can get symfony2 to work (some problems still, but basically). Though there is still some stuff missing like variable namespace resolution for all cases apart from new $class.
Now I wrote a script which will iterate recursively through a directory and process all files: http://github.com/nikic/prephp/blob/master/prephp/namespacePortR.php
Usage Instructions
Requirements for your code to work
Your classnames mustn't contain the _ character. If they do, classnames could get ambiguous while converting.
Your code mustn't redeclare any global functions or constants within a namespace. Thus it is ensured that all your code may be resolved at compile-time.
Basically these are the only restrictions to your code. Though I should note that in a default configuration the namespacePortR will not resolve things like $className = 'Some\\NS\\Class'; new $className, because it would require inserting additional code. It's better that this is patched up later (either manually or using an automated patching system.)
Configuration
As we have made the assumption that no global function or constant is redeclared in a namespace you must set the assumeGlobal class constant in the namespace listener. In the same file set the SEPARATOR constant to _.
In the namespacePortR change the configuration block to satisfy your needs.
PS: The script may be provided a ?skip=int option. This tells it to skip the first int files. You should not need it, if you have set the override mode to intelligent.
Here's what I've found:
Doing this with regular expressions is a nightmare. You can get most of it done with just a few simple expressions, but then edge cases are a killer. I've ended up with horrible, fragile mess that barely works with one codebase.
It's doable with built-in tokenizer and simple recursive descent parser that handles only simplified subset of the language.
I've ended up with rather ugly design (parser and transformer in one – mostly just changing or re-emitting tokens), because it seemed too much work to build useful syntax tree with whitespace maintained (I wanted resulting code to be human-readable).
I wanted to try phc for this, but couldn't convince its configure that I have built required version of Boost library.
I haven't tried ANTLR for this yet, but it's probably the best tool for that kind of tasks.
I am working on a project that emulates PHP 5.3 on PHP 5.2: prephp. It includes namespace support (not yet complete though.)
Now, out of the experience of writing this there is one ambiguity problem in namespace resolution: Unqualified function calls and constant lookups have a fallback to the global namespace. So you could convert your code automatically only if you either fully qualified or qualified all your function calls/constant lookups or if you didn't redefine any function or constant in a namespace with the same name as a PHP built in function.
If you strictly adhered to this practice (whichever of them you choose) it would be fairly easy to convert your code. It would be a subset of the code for emulating namespaces in prephp. If you need help with the implementation, fell free to ask me, I would be interested ;)
PS: The namespace emulation code of prephp isn't complete yet and may be buggy. But it may give you some insights.
Here's the best answer I think you're going to be able to find:
Step 1: Create a directory called 5.3 for every directory w/ php5.3 code in it and stick all 5.3-specific code in it.
Step 2: Take a class you want to put in a namespace and do this in 5.3/WebPage/Consolidator.inc.php:
namespace WebPage;
require_once 'WebPageConsolidator.inc.php';
class Consolidator extends \WebpageConsolidator
{
public function __constructor()
{
echo "PHP 5.3 constructor.\n";
parent::__constructor();
}
}
Step 3: Use a strategy function to use the new PHP 5.3 code. Place in non-PHP5.3 findclass.inc.php:
// Copyright 2010-08-10 Theodore R. Smith <phpexperts.pro>
// License: BSD License
function findProperClass($className)
{
$namespaces = array('WebPage');
$namespaceChar = '';
if (PHP_VERSION_ID >= 50300)
{
// Search with Namespaces
foreach ($namespaces as $namespace)
{
$className = "$namespace\\$className";
if (class_exists($className))
{
return $className;
}
}
$namespaceChar = "\\";
}
// It wasn't found in the namespaces (or we're using 5.2), let's search global namespace:
foreach ($namespaces as $namespace)
{
$className = "$namespaceChar$namespace$className";
if (class_exists($className))
{
return $className;
}
}
throw new RuntimeException("Could not load find a suitable class named $className.");
}
Step 4: Rewrite your code to look like this:
<?php
require 'findclass.inc.php';
$includePrefix = '';
if (PHP_VERSION_ID >= 50300)
{
$includePrefix = '5.3/';
}
require_once $includePrefix . 'WebPageConsolidator.inc.php';
$className = findProperClass('Consolidator');
$consolidator = new $className;
// PHP 5.2 output: PHP 5.2 constructor.
// PHP 5.3 output: PHP 5.3 constructor. PHP 5.2 constructor.
That will work for you. It is a cludge performance-wise, but just a little, and will be done away with when you decide to stop supporting 5.3.
What I did, with a large codebase that used the underscore naming convention (among others), and require_once a whole lot in lieu of an autoloader, was to define an autoloader, and add class_alias lines in the files defining aliases to a classes old name after changing their names to be nice with namespaces.
I then started removing require_once statements where execution was not dependent on inclusion order, since the autoloader would pick stuff up, and namespace stuff as I went along fixing bugs and so on.
It's worked quite well so far.
Well, I don't know if it is the "best" way, but in theory, you could use a script to take your 5.3 migrate code and backport it into 5.2 (potentially even using PHP).
On your namespace files you would want to do something convert:
namespace \Project\Directory\Filename;
class MyClass {
public $attribute;
public function typedFunction(MyClass $child) {
if ($child instanceof MyClass) {
print 'Is MyClass';
}
}
}
To something like:
class Project_Directory_Filename_MyClass {
public $attribute;
public function typedFunction(Project_Directory_Filename_MyClass $child) {
if ($child instanceof Project_Directory_Filename_MyClass) {
print 'Is MyClass';
}
}
}
And in your namespace code you would need to convert from:
$myobject = new Project\Directory\Filename\MyClass();
To:
$myobject = new Project_Directory_Filename_MyClass();
While all your includes and requires would stay the same, I think you would almost need to keep some sort of Cache of all your classes and namespace to do the complex conversion around the 'instanceof' and typed parameters if you use them. That is the trickiest thing I can see.
I haven't tested this on my own, but you may take a look on this php 5.2 -> php 5.3 conversion script.
It's is not the same as 5.3 -> 5.2, but maybe you will find some useful stuff there.
Our DMS Software Reengineering Toolkit can likely implement your solution pretty well. It is designed to carry out reliable source code transformations, by using AST to AST transforms coded in surface-syntax terms.
It has a PHP Front End which is a full, precise PHP parser, AST builder, and AST to PHP-code regenerator. DMS provides for AST prettyprinting, or fidelity printing ("preserve column numbers where possible").
This combination has been used to implement a variety of trustworthy PHP source code manipulation tools for PHP 4 and 5.
EDIT (in response to a somewhat disbelieving comment):
For the OP's solution, the following DMS transformation rule should do most of the work:
rule replace_underscored_identifier_with_namespace_path(namespace_path:N)
:namespace_path->namespace_path
"\N" -> "\complex_namespace_path\(\N\)"
if N=="NCLASS_OR_NAMESPACE_IDENTIFIER" && has_underscores(N);
This rule finds all "simple" identifiers that are used where namespace paths are allowed,
and replaces those simple identifiers with the corresponding namespace path constructed
by tearing the string for the identifier apart into consitutent elements separated by underscores. One has to code some procedural help
in DMS's implementation langauge, PARLANSE, to check that the identifier contains underscores ("has_underscores"), and to implement the tear apart logic by building the corresponding namespace path subtree ("complex_namespace_path").
The rule works by abstractly identifying trees that correspond to language nonterminals (in this case, "namespace_path", and replacing simple ones by more complex trees that represent the full name space path. The rule is written as text, but the rule itself is parsed by DMS to construct the trees it needs to match PHP trees.
DMS rule application logic can trivially apply this rule everywhere throughout the AST produced by the PHP parser.
This answer may seem overly simple in the face of all the complicated stuff that makes up the PHP langauge, but all that other complexity is hidden in the PHP langauge definition used by DMS; that definition is some 10,000 lines of lexical and grammar definitions, but is already tested and working. All the DMS machinery, and these 10K lines, are indications of why simple regexes can't do the job reliably. (It is surprising how much machinery it takes to get this right; I've been working on DMS since 1995).
If you want to see all the machinery that makes up how DMS defines/manipulates a language, you can see a nice simple example.

Categories