XML-Output of the character ellipsis from filename - php

I like to print out the special character ellipsis "…" in XML. If it is hardcoded it works. But if I get that character from readdir(). It won't work. Why?
Code:
<?php
header('Content-Type: text/xml; charset=utf-8');
$maxnesting = 2;
echo "<root>";
initXMLDir("//somefolder");
function initXMLDir($target, $level = 0){
global $maxnesting;
$ignore = array("cgi-bin", ".", "..");
if(is_dir($target) && $level < $maxnesting){
if($dir = opendir($target)){
while (($file = readdir($dir)) !== false){
if(!in_array($file, $ignore)){
if(is_dir("$target/$file")){
echo "<object><name>".$file."</name>";
initXMLDir("$target/$file", ($level+1));
echo "</object>";
}
else{
echo "<object>".$file."</object>";
}
}
}
}
closedir($dir);
}
}
echo "</root>";
?>
If I hardcode it like this and remove the character for example from the filename, it works.
echo "<object>…".$file."…</object>";
The error it prints out is.
An invalid character was found in text content.
Edit-Workaround:
So my solution or workaround for this problem. By combining this function I found here
function xml_character_encode($string, $trans='') {
$trans=(is_array($trans)) ? $trans : get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
foreach ($trans as $k=>$v) $trans[$k]= "&#".ord($k).";";
return strtr($string, $trans);
}
and with
iconv(mb_detect_encoding($file, "auto"), 'UTF-8', $file);
I solved my problem. So basically I'm encoding all characters first which causes problem with iconv() so I can safely use that later.
So use it like this:
$file= xml_character_encode($file);
$file= iconv(mb_detect_encoding($file, "auto"), 'UTF-8', $file);
I tried to manually replace the character ellipsis because it seems it's the only special character that won't display properly with utf8_encode() and htmlspecialchars() (which are the only 2 functions I would need if ellipsis would display properly) but can't be done somehow with strtr().

Related

PHP preg_replace - text will not be recognized

I have problems with preg_replace. The $insert_marker text will not be recognized and is caused by "$". If I remove the 2 $ characters, it works. So what is the problem?
function insert_into_file($file_path, $insert_marker, $text, $after = true) {
$contents = file_get_contents($file_path);
$new_contents = preg_replace($insert_marker, ($after) ? '$0' . $text : $text . '$0', $contents);
return file_put_contents($file_path, $new_contents);
}
$file_path = ".htaccess";
$insert_marker = "/##-- $Id: _.htaccess 10934 2017-08-31 12:11:28Z serpent_driver $/";
$text = "\n##added text";
$num_bytes = insert_into_file($file_path, $insert_marker, $text, true);
if ($num_bytes === false) {
echo "Could not insert into file $file_path.";
} else {
echo "Insert successful!";
}
$ is a special character for regex (it means end of line), you have to escape it: \$ and because you are using a variable that contains the regex and pass it as a parameter of the function, you have to triple escape:
$insert_marker = "/##-- \\\$Id: _.htaccess 10934 2017-08-31 12:11:28Z serpent_driver \\\$/";
It's a little hard to understand your question, but I figured out that you mean the $Id part of $insert_marker is causing issues. You need to escape the $ with a backslash, like so: $insert_marker = "/##-- \$Id: _.htaccess 10934 2017-08-31 12:11:28Z serpent_driver $/";

PHP issue with diacritics

I have thi code for read files from folder:
<?php
$directory = "Dokumenty/rozne";
$a = array_diff(scandir($directory), array('..', '.'));
$i = 1;
foreach($a as $key => $name){
$link = "http://mana.fara.sk/Dokumenty/rozne/" . $name;
echo "<p>$i: <a href='$link' >$name</a></p><br>";
$i++;
}
?>
but on the webpage diacritics is displayed incorrectly: here is example
Pamiatkovy���� vyskum.docx
Can you help me how to selve this problem?.... In head a have <meta charset="UTF-8"> and html lang is lang="sk-SK"
THX
That's probably because scandir return a non-UTF-8 string. You should either update your file names with the right encoding, or convert the string's encoding to UTF-8. Windows should use ISO-8859-1 or Windows-1252.
So, you can try with:
$name = iconv('Windows-1252', 'UTF-8', $name);

How to get preg_replace() to delete text between two tags?

I'm trying to make a function in PHP that can delete code within two tags from all .js file within one folder and all its subfolders. So far everything works except preg_replace(). This is my code:
<?php
deleteRealtimeTester('test');
function deleteRealtimeTester($folder_path)
{
foreach (glob($folder_path . '/*.js') as $file)
{
$string = file_get_contents($file);
$string = preg_replace('#//RealtimeTesterStart(.*?)//RealtimeTesterEnd#', 'test2', $string);
$file_open = fopen($file, 'wb');
fwrite($file_open, $string);
fclose($file_open);
}
$subfolders = array_filter(glob($folder_path . '/*'), 'is_dir');
if (sizeof($subfolders) > 0)
{
for ($i = 0; $i < sizeof($subfolders); $i++)
{
echo $subfolders[$i];
deleteRealtimeTester($subfolders[$i]);
}
}
else
{
return;
}
}
?>
As mentioned I want to delete everything inside these tags and the tags themselve:
//RealtimeTesterStart
//RealtimeTesterEnd
It is important that the tags contains the forward slashes and also that if a file contains multiple of these tags, only code from //RealtimeTesterStart to //RealtimeTesterEnd is deleted and not from //RealtimeTesterEnd to //RealtimeTesterStart.
I hope that someone can help me.
You could also change your regex to use the [\s\S] character set which can be used to match any character, including line breaks.
So have the following
preg_replace('#\/\/RealtimeTesterStart[\s\S]+\/\/RealtimeTesterEnd#', '', $string);
This would remove the contents of //RealtimeTesterStart to //RealtimeTesterEnd and the tags themselves.
I'm assuming that //RealtimeTesterStart, //RealtimeTesterEnd and the code in between are on different lines? In PCRE . does NOT match newlines. You need to use the s modifier ( and you don't need the () unless you need the captured text for the replacement):
#//RealtimeTesterStart.*?//RealtimeTesterEnd#s
Also, look at GLOB_ONLYDIR for glob instead of array_filter. Also, also, maybe file_put_contents instead of fopen etc.
Maybe something like:
foreach (glob($folder_path . '/*.js') as $file) {
$string = file_get_contents($file);
$string = preg_replace('#//RealtimeTesterStart.*?//RealtimeTesterEnd#s', 'test2', $string);
file_put_contents($file, $string);
}
foreach(glob($folder_path . '/*', GLOB_ONLYDIR) as $subfolder) {
deleteRealtimeTester($subfolder);
}

PHP strpos() finds just single characters, not a whole string

I have a strange problem...
I would like to search in a logfile.
$lines = file($file);
$sampleName = "T3173sGas";
foreach ($lines as &$line) {
if (strpos($line, $sampleName) !== false) {
echo "yes";
}
}
This code is not working, $sampleName is to 100% in the log file. The search works just for single characters; for example "T" or "3" but not for "T3".
Do you have an idea why it's not working? Is the encoding of the logfile wrong?
Thanks a lot for your help!
If you can only find single characters I would assume that your logfile is in some multi-byte character set like UTF-16. As you already assume similar, next step for you is to consult the documentation / specification of the logfile you're trying to operate with regarding the character encoding.
You then can use character-encoding specific string functions, the package is called http://php.net/mbstring.
$encoding = ... ; // encoding of logfile
if (mb_strpos($line, $sampleName, 0, $encoding) !== false) {
echo "yes";
}
This may work, it searches for the entire string
<?php
$filename = 'test.php';
$file = file_get_contents($filename);
$sampleName = "T3173sGas";
if(strlen(strstr($file,$sampleName))>0)
{
echo "yes";
}
?>

Byte Order Mark causing session errors

I have an PHP app with houndreds of files. The problem is that one or several files apparently have a BOM in them, so including them causes error when creating the session... Is there a way how to reconfigure PHP or the server or how can I get rid of the BOM? Or at least identify the source? I would prefer a PHP solution if available
The real solution of course is to fix your editor settings (and the other team members as well) to not store files with UTF byte order mark. Read on here: https://stackoverflow.com/a/2558793/43959
You could use this function to "transparently" remove the BOM before including another PHP file.
Note: I really recommend you to fix your editor(s) / files instead of doing nasty things with eval() which i demonstrate here.
This is just a proof of concept:
bom_test.php:
<?php
function bom_safe_include($file) {
$fd = fopen($file, "r");
// read 3 bytes to detect BOM. file read pointer is now behind BOM
$possible_bom = fread($fd, 3);
// if the file has no BOM, reset pointer to beginning file (0)
if ($possible_bom !== "\xEF\xBB\xBF") {
fseek($fd, 0);
}
$content = stream_get_contents($fd);
fclose($fd);
// execute (partial) script (without BOM) using eval
eval ("?>$content");
// export global vars
$GLOBALS += get_defined_vars();
}
// include a file
bom_safe_include("test_include.php");
// test function and variable from include
test_function($test);
test_include.php, with BOM at beginning
test
<?php
$test = "Hello World!";
function test_function ($text) {
echo $text, PHP_EOL;
}
OUTPUT:
kaii#test$ php bom_test.php
test
Hello World!
I have been able to identify the files that carried BOM inside them with this script, maybe it helps someone else with the same problem in the future. Works without eval().
function fopen_utf8 ($filename) {
$file = #fopen($filename, "r");
$bom = fread($file, 3);
if ($bom != b"\xEF\xBB\xBF")
{
return false;
}
else
{
return true;
}
}
function file_array($path, $exclude = ".|..|libraries", $recursive = true) {
$path = rtrim($path, "/") . "/";
$folder_handle = opendir($path);
$exclude_array = explode("|", $exclude);
$result = array();
while(false !== ($filename = readdir($folder_handle))) {
if(!in_array(strtolower($filename), $exclude_array)) {
if(is_dir($path . $filename . "/")) {
// Need to include full "path" or it's an infinite loop
if($recursive) $result[] = file_array($path . $filename . "/", $exclude, true);
} else {
if ( fopen_utf8($path . $filename) )
{
//$result[] = $filename;
echo ($path . $filename . "<br>");
}
}
}
}
return $result;
}
$files = file_array(".");
vim $(find . -name \*.php)
once inside vim:
:argdo :set nobomb | :w

Categories