Why use DOMDocument instead of PHP with HTML? - php

I still can't really wrap my head around the built in DOMDocument class.
Why should I use that instead of just doing it similar to the following?
I would like to know the benefits.
$URI = $_SERVER['REQUEST_URI'];
$navArr = Config::get('navigation');
$navigation = '<ul id="nav">' . "\n";
foreach($navArr as $name => $path) {
$navigation .= ' <li' . ((in_array($URI, $path)) ? ' class="active"' : false) . '>' . $name . '</li>' . "\n";
}
$navigation .= '</ul>' . "\n\n";
return $navigation;

Here's the same example using DOMDocument:
$doc = new DOMDocument;
$list = $doc->appendChild($doc->createElement('ul'));
$list->setAttribute('id', 'nav');
foreach ($navArr as $name => $path) {
$listItem = $list->appendChild($doc->createElement('li'));
if (in_array($URI, $path)) {
$listItem->setAttribute('class', 'active');
}
$link = $listItem->appendChild($doc->createElement('a'));
$link->setAttribute('href', $path[1]);
$link->appendChild($doc->createTextNode($name));
}
return $doc->saveHTML();
It's more verbose, but not too much, and possibly clearer what's happening at each step.
One benefit is character escaping: createTextNode and setAttribute ensure that the special HTML characters (quotes, ampersands and angle brackets) are escaped properly.
In the end, though, for a larger application, you'd probably want to use an actual templating language like Twig for generating HTML, as the templates are more readable and extensible.

Related

PHP Formatter for single-line code blocks

I am trying to configure the Eclipse PHP formatter to keep the opening and closing php tags on 1 line if there is only 1 line of code between them (while keeping the default new line formatting if there are more lines of code).
Example:
<td class="main"><?php
echo drm_draw_input_field('fax') . ' ' . (drm_not_null(ENTRY_FAX_NUMBER_TEXT) ? '<span class="inputRequirement">' . ENTRY_FAX_NUMBER_TEXT . '</span>' : '');
?></td>
Should be formatted to:
<td class="main"><?php echo drm_draw_input_field('fax') . ' ' . (drm_not_null(ENTRY_FAX_NUMBER_TEXT) ? '<span class="inputRequirement">' . ENTRY_FAX_NUMBER_TEXT . '</span>': ''); ?></td>
Is there any way to achieve this with Eclipse? Or another suggestion/formatter?
EDIT:
It seems Eclipse does not have such a formatting option, as explained in the comments below. Any existing alternatives that can do this?
As i already mentioned under question comment "As i know you can't do such thing in eclipse as eclipse has only options to format code part i mean the text part inside php tags <?php ...code text... ?>"
But you can achieve it with this php script
Very important before start: Backup your php project which you are going to mention in dirToArray() function
// Recursive function to read directory and sub directories
// it build based on php's scandir - http://php.net/manual/en/function.scandir.php
function dirToArray($dir) {
$result = array();
$cdir = scandir($dir);
foreach ($cdir as $key => $value){
if (!in_array($value,array(".",".."))){
if (is_dir($dir . DIRECTORY_SEPARATOR . $value)){
$result = array_merge(
$result,
dirToArray($dir . DIRECTORY_SEPARATOR . $value)
);
}else{
$result[] = $dir . DIRECTORY_SEPARATOR . $value;
}
}
}
return $result;
}
// Scanning project files
$files = dirToArray("/home/project"); // or C:/project... for windows
// Reading and converting to single line php blocks which contain 3 or less lines
foreach ($files as $file){
// Reading file content
$content = file_get_contents($file);
// RegExp will return 2 arrays
// first will contain all php code with php tags
// second one will contain only php code
// UPDATED based on Michael's provided regexp in this answer comments
preg_match_all( '/<\?php\s*\r?\n?(.*?)\r?\n?\s*\?>/i', $content, $blocks );
$codeWithTags = $blocks[0];
$code = $blocks[1];
// Loop over matches and formatting code
foreach ($codeWithTags as $k => $block){
$content = str_replace($block, '<?php '.trim($code[$k]).' ?>', $content );
}
// Overwriting file content with formatted one
file_put_contents($file, $content);
}
NOTE: This is just simple example and of course this script can be improved
// Result will be that this
text text text<?php
echo "11111";
?>
text text text<?php
echo "22222"; ?>
text text text<?php echo "33333";
?>
<?php
echo "44444";
echo "44444";
?>
// will be formated to this
text text text<?php echo "11111"; ?>
text text text<?php echo "22222"; ?>
text text text<?php echo "33333"; ?>
<?php
echo "44444";
echo "44444";
?>
In sublimetext, the manual command is ctrl + J.
Automagically, might i suggest you take a look at this:
https://github.com/heroheman/Singleline

PHP File Manager

I recently started working on a PHP File Manager for my server, as I figured it'd be extremely convient to use, as well as allowing me to brush up on my PHP Skills. Anyways, I have a few questions that I hope can be answered...
When I list my directorys, there are always a couple of "Dots". For example: ., .., Folder_1, Folder_2, etc... How would I go about removing those "Dots" from my directory list?
When I list my directorys, my current method has no problem listing folders with underscores, or ones that have no space in the name. However, it cannot handle Folders with space's in their names. Is there a way to get my File Manager to recognize and handle spaces in the names properly?
Here is my current code...
<?php
global $dir_path;
if (isset($_GET["directory"])) {
$dir_path = $_GET["directory"];
//echo $dir_path;
}
else {
$dir_path = $_SERVER["DOCUMENT_ROOT"]."/";
}
$directories = scandir($dir_path);
foreach($directories as $entry) {
if(is_dir($dir_path . "/" . $entry )) {
echo "<li>" . $entry . "</li>";
}
else {}
}
?>
Much thanks for any help,
Brandon
P.S. Are the "Dots" related to my server's ext4 file-system? It's not really significantly pertinent to my problems, I'm just a tad curious.
If you just want a simple version :
foreach($directories as $entry) {
if (is_dir($dir_path . "/" . $entry) && !in_array($entry, array('.','..'))) {
echo "<li>" . $entry . "</li>";
}
else {}
}
this checks for . / .. eg current dir and back dir. Regarding the spaces it sounds weird. Is it the link that is not working or is it scandir? If it is the links, replace blanks with %20, eg
$href="?directory=" . $dir_path . "" . str_replace(' ','%20',$entry) . "/";
echo "<li>' . $entry . '</li>';
more likely I think it is the lack of quotes "" around href, eg
echo '<li>' . $entry . '</li>';
instead. When you are not adding quoutes, a link with blanks, say "test 123" will be interpreted as href=test by the browser, because there is nothing that encapsulates the whole link. It should be href="test 123".

How to write a bot that does not consume much RAM?

I have a web bot and It consumes my memory so much, after a time, memory usage hits to 50%, and the process gets killed; I have no idea why memory usage is increasing like that, I did not include "para.php" which is a library for parallel curl requests. I want to know more things about web crawlers, I searched a lot, but could not find any helpful document or methods that I can use.
This is the library from which I obtained para.php.
My code:
require_once "para.php";
class crawling{
public $montent;
public function crawl_page($url){
$m = new Mongo();
$muun = $m->howto->en->findOne(array("_id" => $url));
if (isset($muun)) {
return;
}
$m->howto->en->save(array("_id" => $url));
echo $url;
echo "\n";
$para = new ParallelCurl(10);
$para->startRequest($url, array($this,'on_request_done'));
$para->finishAllRequests();
preg_match_all("(<a href=\"(.*)\")siU", $this->montent, $matk);
foreach($matk[1] as $longu){
$href = $longu;
if (0 !== strpos($href, 'http')) {
$path = '/' . ltrim($href, '/');
if (extension_loaded('http')) {
$href = http_build_url($url, array('path' => $path));
} else {
$parts = parse_url($url);
$href = $parts['scheme'] . '://';
if (isset($parts['user']) && isset($parts['pass'])) {
$href .= $parts['user'] . ':' . $parts['pass'] . '#';
}
$href .= $parts['host'];
if (isset($parts['port'])) {
$href .= ':' . $parts['port'];
}
$href .= $path;
}
}
$this->crawl_page($longu);
}
}
public function on_request_done($content) {
$this->montent = $content;
}
$moj = new crawling;
$moj->crawl_page("http://www.example.com/");
You call this crawl_page function on 1 url.
It's content is fetched ($this->montent) and checked for links ($matk).
While these are not yet destroyed, you go recursive, starting a new call to crawl_page. $this->moment will be overwritten with the new content (that's ok). A bit further down, $matk (a new variable) is populated with the links for the new $this->montent. At this point, there are 2 $matk's in memory: the one with all links for the document you started processing first, and the one with all links for the document that was first linked to in your original document.
I'd suggest to find all links & save them to a database (instead of immediately going recursive). Then just clear the queue of links in the database, 1 by 1 (with each new document adding a new entry to the database)

MediaWiki + Graphviz + Image maps + Pagelinks

Background: Working with MediaWiki 1.19.1, Graphviz 2.28.0, Extension:GraphViz 0.9 on WAMP stack (Server 2008, Apache 2.4.2, MySQL 5.5.27, PHP 5.4.5). Everything is working great and as expected for the basic functionality of rendering a clickable image from a Graphviz diagram using the GraphViz extension in MediaWiki.
Problem: The links in the image map are not added to the MediaWiki pagelinks table. I get why they aren't added but it becomes an issue if there is no way to follow the links back with the 'What links here' functionality.
Desired solution: During the processing of the diagram in the GraphViz extension, I would like to use the generated .map file to then create a list of wikilinks to add on the page to get picked up by MediaWiki and added to the pagelinks table.
Details:
This GraphViz extension code:
<graphviz border='frame' format='png'>
digraph example1 {
// define nodes
nodeHello [
label="I say Hello",
URL="Hello"
]
nodeWorld [
label="You say World!",
URL="World"
]
// link nodes
nodeHello -> nodeWorld!
}
</graphviz>
Generates this image:
And this image map code in a corresponding .map file on the server:
<map id="example1" name="example1">
<area shape="poly" id="node1" href="Hello" title="I say Hello" alt="" coords="164,29,161,22,151,15,137,10,118,7,97,5,77,7,58,10,43,15,34,22,31,29,34,37,43,43,58,49,77,52,97,53,118,52,137,49,151,43,161,37"/>
<area shape="poly" id="node2" href="World" title="You say World!" alt="" coords="190,125,186,118,172,111,152,106,126,103,97,101,69,103,43,106,22,111,9,118,5,125,9,133,22,139,43,145,69,148,97,149,126,148,152,145,172,139,186,133"/>
</map>
From that image map file, I would like to be able to extract the href and title to build wikilinks like so:
[[Hello|I say Hello]]
[[World|You say World!]]
I'm guessing that since that .map file is essentially XML that I could just use XPATH to query the file, but that is just a guess. PHP is not my strongest area and I don't know the best approach to going about the XML/XPATH option or if that is even the best approach to pull that info from the file.
Once I got that collection/array of wikilinks from the .map file, I'm sure I can hack up the GraphViz.php extension file to add it to the contents of the page to get it added to the pagelinks table.
Progress: I had a bit of an Rubber Duck Problem Solving moment right as I submitted the question. I realized that since I had well formed data in the image map, that XPATH was probably the way to go. It was fairly trivial to be able to pull the data I needed, especially since I found that the map file contents was stilled stored in a local string variable.
$xml = new SimpleXMLElement( $map );
foreach($xml->area as $item) {
$links .= "[[" . $item->attributes()->href . "|" . $item->attributes()->title . "]]";
}
Final Solution: See my accepted answer below.
Thanks for taking a look. I appreciate any assistance or direction you can offer.
I finally worked through all of the issues and now have a fairly decent solution to render the graph nicely, provide a list of links, and register the links with wiki. My solution doesn't fully support all of the capabilities of the current GraphViz extension as it is written as there is functionality we do not need and I do not want to support. Here are the assumptions / limitations of this solution:
Does not support MscGen: We only have a need for Graphviz.
Does not support imageAtrributes: We wanted to control the format and presentation and it seemed like there were inconsistencies in the imageAttributes implementation that would then cause further support issues.
Does not support wikilinks: While it would be nice to provide consistent link usage through wiki and the Graphviz extension, the reality is that Graphviz is a completely different markup environment. While the current extension 'supports' wikilinks, the implementation is a little weak and leaves areas for confusion. Example: Wikilinks support giving the link an optional description but Graphviz already uses the node label for the description. So then you end up ignoring the wikilink description and telling users that 'Yes, we support wikilinks but don't use the description part' So since we aren't really using wikilinks correctly, just implement a regular link implementation and try to avoid the confusion entirely.
Here is what the output looks like:
Here are the changes that were made
Comment out this line:
// We don't want to support wikilinks so don't replace them
//$timelinesrc = rewriteWikiUrls( $timelinesrc ); // if we use wiki-links we transform them to real urls
Replace this block of code:
// clean up map-name
$map = preg_replace( '#<ma(.*)>#', ' ', $map );
$map = str_replace( '</map>', '', $map );
if ( $renderer == 'mscgen' ) {
$mapbefore = $map;
$map = preg_replace( '/(\w+)\s([_:%#/\w]+)\s(\d+,\d+)\s(\d+,\d+)/',
'<area shape="$1" href="$2" title="$2" alt="$2" coords="$3,$4" />',
$map );
}
/* Procduce html
*/
if ( $wgGraphVizSettings->imageFormatting )
{
$txt = imageAtrributes( $args, $storagename, $map, $outputType, $wgUploadPath ); // if we want borders/position/...
} else {
$txt = '<map name="' . $storagename . '">' . $map . '</map>' .
'<img src="' . $wgUploadPath . '/graphviz/' . $storagename . '.' . $outputType . '"' .
' usemap="#' . $storagename . '" />';
}
With this code:
$intHtml = '';
$extHtml = '';
$badHtml = '';
// Wrap the map/area info with top level nodes and load into xml object
$xmlObj = new SimpleXMLElement( $map );
// What does map look like before we start working with it?
wfDebugLog( 'graphviz', 'map before: ' . $map . "\n" );
// loop through each of the <area> nodes
foreach($xmlObj->area as $areaNode) {
wfDebugLog( 'graphviz', "areaNode: " . $areaNode->asXML() . "\n" );
// Get the data from the XML attributes
$hrefValue = (string)$areaNode->attributes()->href;
$textValue = (string)$areaNode->attributes()->title;
wfDebugLog( 'graphviz', '$hrefValue before: ' . $hrefValue . "\n" );
wfDebugLog( 'graphviz', '$textValue before: ' . $textValue . "\n" );
// For the text fields, multiple spaces (" ") in the Graphviz source (label)
// turns into a regular space followed by encoded representations of
// non-breaking spaces ("   ") in the .map file which then turns
// into the following in the local variables: ("   ").
// The following two options appear to convert/decode the characters
// appropriately. Leaving the lines commented out for now, as we have
// not seen a graph in the wild with multiple spaces in the label -
// just happened to stumble on the scenario.
// See http://www.php.net/manual/en/simplexmlelement.asxml.php
// and http://stackoverflow.com/questions/2050723/how-can-i-preg-replace-special-character-like-pret-a-porter
//$textValue = iconv("UTF-8", "ASCII//TRANSLIT", $textValue);
//$textValue = html_entity_decode($textValue, ENT_NOQUOTES, 'UTF-8');
// Now we need to deal with the whitespace characters like tabs and newlines
// and also deal with them correctly to replace multiple occurences.
// Unfortunately, the \n and \t values in the variable aren't actually
// tab or newline characters but literal characters '\' + 't' or '\' + 'n'.
// So the normally recommended regex '/\s+/u' to replace the whitespace
// characters does not work.
// See http://stackoverflow.com/questions/6579636/preg-replace-n-in-string
$hrefValue = preg_replace("/( |\\\\n|\\\\t)+/", ' ', $hrefValue);
$textValue = preg_replace("/( |\\\\n|\\\\t)+/", ' ', $textValue);
// check to see if the url matches any of the
// allowed protocols for external links
if ( preg_match( '/^(?:' . wfUrlProtocols() . ')/', $hrefValue ) ) {
// external link
$parser->mOutput->addExternalLink( $hrefValue );
$extHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
}
else {
$first = substr( $hrefValue, 0, 1 );
if ( $first == '\\' || $first == '[' || $first == '/' ) {
// potential UNC path, wikilink, absolute or relative path
$hrefValue = '#InvalidLink';
$badHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
$textValue = 'Invalid link. Check Graphviz source.';
}
else {
$title = Title::newFromText( $hrefValue );
if ( is_null( $title ) ) {
// invalid link
$hrefValue = '#InvalidLink';
$badHtml .= Linker::makeExternalLink( $hrefValue, $textValue ) . ', ';
$textValue = 'Invalid link. Check Graphviz source.';
}
else {
// internal link
$parser->mOutput->addLink( $title );
$intHtml .= Linker::link( $title, $textValue ) . ', ';
$hrefValue = $title->getFullURL();
}
}
}
$areaNode->attributes()->href = $hrefValue;
$areaNode->attributes()->title = $textValue;
}
$map = $xmlObj->asXML();
// The contents of $map, which is now XML, gets embedded
// in the HTML sent to the browser so we need to strip
// the XML version tag and we also strip the <map> because
// it will get replaced with a new one with the correct name.
$map = str_replace( '<?xml version="1.0"?>', '', $map );
$map = preg_replace( '#<ma(.*)>#', ' ', $map );
$map = str_replace( '</map>', '', $map );
// Let's see what it looks like now that we are done with it.
wfDebugLog( 'graphviz', 'map after: ' . $map . "\n" );
$txt = '' .
'<table style="background-color:#f9f9f9;border:1px solid #ddd;">' .
'<tr>' .
'<td style="border:1px solid #ddd;text-align:center;">' .
'<map name="' . $storagename . '">' . $map . '</map>' .
'<img src="' . $wgUploadPath . '/graphviz/' . $storagename . '.' . $outputType . '"' . ' usemap="#' . $storagename . '" />' .
'</td>' .
'</tr>' .
'<tr>' .
'<td style="font:10px verdana;">' .
'This Graphviz diagram links to the following pages:' .
'<br /><strong>Internal</strong>: ' . ( $intHtml != '' ? rtrim( $intHtml, ' ,' ) : '<em>none</em>' ) .
'<br /><strong>External</strong>: ' . ( $extHtml != '' ? rtrim( $extHtml, ' ,' ) : '<em>none</em>' ) .
( $badHtml != '' ? '<br /><strong>Invalid</strong>: ' . rtrim($badHtml, ' ,') .
'<br /><em>Tip: Do not use wikilinks ([]), UNC paths (\\) or relative links (/) when creating links in Graphviz diagrams.</em>' : '' ) .
'</td>' .
'</tr>' .
'</table>';
Possible enhancements:
It would be nice if the list of links below the graph were sorted and de-duped.

PHP - Do I need any UTF-8 encoding/decoding?

Ok, I am writing comments to a UTF-8 file that I read within the function below to remove the text in between these comments. My question is, do I need anything different in here to do this successfully for UTF-8 files? Or will the following code below work? Basically, I am wondering if I need utf8_decode and/or utf8_encode functions, or perhaps iconv function?
// This holds the current file we are working on.
$lang_file = 'files/DreamTemplates.russian-utf8.php';
// Can't read from the file if it doesn't exist now can we?
if (!file_exists($lang_file))
continue;
// This helps to remove the language strings for the template, since the comment is unique
$template_begin_comment = '// ' . ' Template - ' . $lang_file . ' BEGIN...';
$template_end_comment = '// ' . ' Template - ' . $lang_file . ' END!';
$fp = fopen($lang_file, 'rb');
$content = fread($fp, filesize($lang_file));
fclose($fp);
// Searching within the string, extracting only what we need.
$start = strpos($content, $template_begin_comment);
$end = strpos($content, $template_end_comment);
// We can't do this unless both are found.
if ($start !== false && $end !== false)
{
$begin = substr($content, 0, $start);
$finish = substr($content, $end + strlen($template_end_comment));
$new_content = $begin . $finish;
// Write it into the file.
$fo = fopen($lang_file, 'wb');
#fwrite($fo, $new_content);
fclose($fo);
}
Thanks for your help on this concerning UTF-8 encoding and decoding on strings, even if they are commented strings.
When I write the php comments into the UTF-8 file I am not using any conversion. Should I be?? The string definitions between the php comments is already encoded in UTF-8 however and seems to work fine within the file. Any help appreciated here.
No, you don't need to do any conversions.
Also, your extraction code will be reliable in the sense that it wont mangle multibyte characters, although you might want to make sure the end position occurs after the start pos.
To do this I would use preg_replace instead:
$content = file_get_contents($lang_file);
$template_begin_comment = '// ' . ' Template - ' . $lang_file . ' BEGIN...';
$template_end_comment = '// ' . ' Template - ' . $lang_file . ' END!';
// find from begin comment to end comment
// replace with emptiness
// keep track of how many replacements have been made
$new_content = preg_replace('/' .
preg_quote($template_begin_comment, '/') .
'.*?' .
preg_quote($template_end_comment, '/') . '/s',
'',
$content,
-1,
$replace_count
);
if ($replace_count) {
// if replacements have been made, write the file back again
file_put_contents($lang_file, $new_content);
}
Because your matching only contains ASCII, this approach is safe enough because the rest is copied verbatim.
Disclaimer
Above code is not tested, if there's anything wrong just let me know.

Categories