Can I safely use explode() on a multi-byte string, specifically UTF8? Or do I need to use mb_split()?
If mb_split(), then why?
A multi-byte string is still just a string, and explode would happily split it on whatever delimiter you provide. My guess is that they will probably behave identically under most circumstances. If you are concerned about a particular situation, consider using this test script:
<?php
$test = array(
"ὕβρις",
"путин бандит",
"Дерипаска бандит",
"Трамп наша сука"
);
$delimiter = "д";
foreach($test as $t) {
$explode = explode($delimiter, $t);
echo "explode: " . implode("\t", $explode) . "\n";
$split = mb_split($delimiter, $t);
echo "split : " . implode("\t", $split) . "\n\n";
if ($explode != $split) {
throw new Exception($t . " splits differently!");
}
}
echo "script complete\n";
It's worth pointing out that both explode() and mb_split() have the exact same parameter list -- without any reference to language or character encoding. You should also realize that how your strings are defined in PHP depend on where and how you obtain your delimiter and the string to be exploded/split. Your strings might come from a text or csv file, a form submission in a browser, an API call via javascript, or you may define those strings right in your PHP script as I have here.
I might be wrong, but I believe that both functions will work by looking for instances of the delimiter in the string to be exploded and will split them.
Related
I'm localizing a website that I've built. I'm doing this by having a .lang file read and each line (syntax: key=string) is placed in a variable depending on the chosen language.
This array is then used to place the strings in the correct places.
The problem I'm having is that certain strings need to have hyperlinks in the middle of them for example someplace I've put my name that links to my contact page. Or a lot of the readouts of the website need to be in the strings.
To solve this I've defined a variable that holds the html + Forecaster + html,
and the localization file contains the $Forecaster variable in the string.
The problem with this as I promptly discovered is that it stubbornly refuses to parse the inline variables in the strings from the file.
Instead it prints the string and variable name as it looks in the file.
And I have yet to find a way to make it parse the variables.
For example "Heating up took $str_time" would be printed on the page exactly like that, instead of inputting the previously defined value of $str_time.
I currently use fopen() and fgets() to open and read the lines. I then explode them to separate the key and the string and then place these into the array.
Is there a way to make it parse the variables, or alternatively is there another way of reading the lines that allows for parsing the inline variables?
The code that gets the line and converts it to the array looks like this:
(It obviously loops through the lines)
#list($key, $string) = explode('=', $line);
$key = strtok($line, '=');
$string = strtok('=');
$local[$key] = $string;
$counter++;
echo $local[$key] . "<br>";
The counter is unused and the echo is for testing.
A line from the .lang file looks like this:
fuel.results.heatup.timeused=Heating up took $str_time
I would call the array where I want the string like this:
$local['fuel.results.heatup.timeused']
As you can see I've tried both explode and strtok but it hasn't made a difference.
Personally I'd write your text file in JSON format to make it easier to pull data out.
Here is a solution directly from the php manual: http://nz2.php.net/manual/en/function.eval.php
$string = 'cup';
$name = 'coffee';
$str = 'This is a $string with my $name in it.';
echo $str. "\n";
eval("\$str = \"$str\";");
echo $str. "\n";
It is worth noting that eval() can be very dangerous used in the wrong way so make sure you're code is very secure E.g. if someone altered your txt file with real PHP code they could execute it directly on the server.
Another approach would require you to know all your variable names and could then do something like:
$str = 'Heating up took $str_time';
echo 'str=' . str_replace('$str_time', $str_time, $str);
Or do this via an array:
$str = 'Heating up took $str_time as well as $other_value';
$vars = Array('str_time', 'other_value');
foreach($vars as $varName) {
$str = str_replace('$' . $varName, $$varName, $str);
}
echo 'str=' . $str;
If you not know all the variable name, you can use this example, without eval(). It is indicatred to avoid eval().
$str = 'fuel.results.heatup.timeused=Heating up took $str_time';
$str_time = 'value';
if(preg_match('/\$([a-z0-9_]+)/i', $str, $v)) {
$vname = $v[1];
$str = str_replace('$'.$vname, $$vname, $str);
}
echo $str; // fuel.results.heatup.timeused=Heating up took value
Basically I need a regex expression to match all double quoted strings inside PHP tags without a variable inside.
Here's what I have so far:
"([^\$\n\r]*?)"(?![\w ]*')
and replace with:
'$1'
However, this would match things outside PHP tags as well, e.g HTML attributes.
Example case:
Here's my "dog's website"
<?php
$somevar = "someval";
$somevar2 = "someval's got a quote inside";
?>
<?php
$somevar3 = "someval with a $var inside";
$somevar4 = "someval " . $var . 'with concatenated' . $variables . "inside";
$somevar5 = "this php tag doesn't close, as it's the end of the file...";
it should match and replace all places where the " should be replaced with a ', this means that html attributes should ideally be left alone.
Example output after replace:
Here's my "dog's website"
<?php
$somevar = 'someval';
$somevar2 = 'someval\'s got a quote inside';
?>
<?php
$somevar3 = "someval with a $var inside";
$somevar4 = 'someval ' . $var . 'with concatenated' . $variables . 'inside';
$somevar5 = 'this php tag doesn\'t close, as it\'s the end of the file...';
It would also be great to be able to match inside script tags too...but that might be pushing it for one regex replace.
I need a regex approach, not a PHP approach. Let's say I'm using regex-replace in a text editor or JavaScript to clean up the PHP source code.
tl;dr
This is really too complex complex to be done with regex. Especially not a simple regex. You might have better luck with nested regex, but you really need to lex/parse to find your strings, and then you could operate on them with a regex.
Explanation
You can probably manage to do this.
You can probably even manage to do this well, maybe even perfectly.
But it's not going to be easy.
It's going to be very very difficult.
Consider this:
Welcome to my php file. We're not "in" yet.
<?php
/* Ok. now we're "in" php. */
echo "this is \"stringa\"";
$string = 'this is \"stringb\"';
echo "$string";
echo "\$string";
echo "this is still ?> php.";
/* This is also still ?> php. */
?> We're back <?="out"?> of php. <?php
// Here we are again, "in" php.
echo <<<STRING
How do "you" want to \""deal"\" with this STRING;
STRING;
echo <<<'STRING'
Apparently this is \\"Nowdoc\\". I've never used it.
STRING;
echo "And what about \\" . "this? Was that a tricky '\"' to catch?";
// etc...
Forget matching variable names in double quoted strings.
Can you just match all of the string in this example?
It looks like a nightmare to me.
SO's syntax highlighting certainly won't know what to do with it.
Did you consider that variables may appear in heredoc strings as well?
I don't want to think about the regex to check if:
Inside <?php or <?= code
Not in a comment
Inside a quoted quote
What type of quoted quote?
Is it a quote of that type?
Is it preceded by \ (escaped)?
Is the \ escaped??
etc...
Summary
You can probably write a regex for this.
You can probably manage with some backreferences and lots of time and care.
It's going to be hard and your probably going to waste a lot of time, and if you ever need to fix it, you aren't going to understand the regex you wrote.
See also
This answer. It's worth it.
Here's a function that utilizes the tokenizer extension to apply preg_replace to PHP strings only:
function preg_replace_php_string($pattern, $replacement, $source) {
$replaced = '';
foreach (token_get_all($source) as $token) {
if (is_string($token)){
$replaced .= $token;
continue;
}
list($id, $text) = $token;
if ($id === T_CONSTANT_ENCAPSED_STRING) {
$replaced .= preg_replace($pattern, $replacement, $text);
} else {
$replaced .= $text;
}
}
return $replaced;
}
In order to achieve what you want, you can call it like this:
<?php
$filepath = "script.php";
$file = file_get_contents($filepath);
$replaced = preg_replace_php_string('/^"([^$\{\n<>\']+?)"$/', '\'$1\'', $file);
echo $replaced;
The regular expression that's passed as the first argument is the key here. It tells the function to only transform strings to their single-quoted equivalents if they do not contain $ (embedded variable "$a"), { (embedded variable type 2 "{$a[0]}"), a new line, < or > (HTML tag end/open symbols). It also checks if the string contains a single-quote, and prevents the replacement to avoid situations where it would need to be escaped.
While this is a PHP solution, it's the most accurate one. The closest you can get with any other language would require you to build your own PHP parser in that language to some degree in order for your solution to be accurate.
<?php
include 'db_connect.php';
$q = mysql_real_escape_string($_GET['q']);
$arr = explode('+', $q);
foreach($arr as $ing)
{
echo $ing;
echo "<br/>";
}
mysql_close($db);
?>
Calling:
findByIncredients.php?q=Hans+Wurst+Wurstel
Source code HTML:
Hans Wurst Wurstel<br/>
Why is there only one newline?
+s in URL are urlencoded spaces. So what php sees in the variable is "Hans Wurst Wurstel". You need to split by space ' ', not +
arr = explode (' ',$q);
"+" gets converted to SPACE on URL decoding.
You may want to pass your string as str1-str2-str3 in get parameter.
Try:
<?php
include 'db_connect.php';
$q = mysql_real_escape_string($_GET['q']);
$arr = explode (' ',$q);
foreach($arr as $ing)
{
echo $ing;
echo "<br/>";
}
mysql_close($db);
?>
Hans+Wurst+Wurstel is the url escaped query string. The php page will likely process it once unescaped (in this case, all +s will be translated into spaces). You should choose a delimiter for explode according to the string as it is in that moment. You can use print_r() for a raw print if you don't know how the string (or any kind of variable) looks like.
Easy. While the standard RFC 3986 url encoding would encode the space " " as "%20", due to historical reasons, it can also be encoded as "+". When PHP parses the query string, it will convert the "+" character to a space.
This is also illustrated by the existence of both:
urlencode: equivalent of what PHP uses internally, will convert " " to "+".
rawurlencode: RFC-conformant encoder, will convert " " to "%20".
I'm assuming you want to explode by space. If you really wanted to encode a "+" character, you could use "%2B", which is the rawurlencode version and will always work.
(EDIT)
Related questions:
When to encode space to plus (+) or %20?
PHP - Plus sign with GET query
I need to get the last character of a string.
Say I have "testers" as input string and I want the result to be "s". how can I do that in PHP?
substr("testers", -1); // returns "s"
Or, for multibyte strings :
mb_substr("multibyte string…", -1); // returns "…"
substr($string, -1)
Or by direct string access:
$string[strlen($string)-1];
Note that this doesn't work for multibyte strings. If you need to work with multibyte string, consider using the mb_* string family of functions.
As of PHP 7.1.0 negative numeric indices are also supported, e.g just $string[-1];
From PHP 7.1 you can do this (Accepted rfc for negative string offsets):
<?php
$silly = 'Mary had a little lamb';
echo $silly[-20];
echo $silly{-6};
echo $silly[-3];
echo $silly[-15];
echo $silly[-13];
echo $silly[-1];
echo $silly[-4];
echo $silly{-10};
echo $silly[-4];
echo $silly[-8];
echo $silly{3}; // <-- this will be deprecated in PHP 7.4
die();
I'll let you guess the output.
Also, I added this to xenonite's performance code with these results:
substr() took 7.0334868431091seconds
array access took 2.3111131191254seconds
Direct string access (negative string offsets) took 1.7971360683441seconds
As of PHP 7.1.0, negative string offsets are also supported.
So, if you keep up with the times, you can access the last character in the string like this:
$str[-1]
DEMO
At the request of a #mickmackusa, I supplement my answer with possible ways of application:
<?php
$str='abcdef';
var_dump($str[-2]); // => string(1) "e"
$str[-3]='.';
var_dump($str); // => string(6) "abc.ef"
var_dump(isset($str[-4])); // => bool(true)
var_dump(isset($str[-10])); // => bool(false)
I can't leave comments, but in regard to FastTrack's answer, also remember that the line ending may be only single character. I would suggest
substr(trim($string), -1)
EDIT: My code below was edited by someone, making it not do what I indicated. I have restored my original code and changed the wording to make it more clear.
trim (or rtrim) will remove all whitespace, so if you do need to check for a space, tab, or other whitespace, manually replace the various line endings first:
$order = array("\r\n", "\n", "\r");
$string = str_replace($order, '', $string);
$lastchar = substr($string, -1);
I'd advise to go for Gordon's solution as it is more performant than substr():
<?php
$string = 'abcdef';
$repetitions = 10000000;
echo "\n\n";
echo "----------------------------------\n";
echo $repetitions . " repetitions...\n";
echo "----------------------------------\n";
echo "\n\n";
$start = microtime(true);
for($i=0; $i<$repetitions; $i++)
$x = substr($string, -1);
echo "substr() took " . (microtime(true) - $start) . "seconds\n";
$start = microtime(true);
for($i=0; $i<$repetitions; $i++)
$x = $string[strlen($string)-1];
echo "array access took " . (microtime(true) - $start) . "seconds\n";
die();
outputs something like
----------------------------------
10000000 repetitions...
----------------------------------
substr() took 2.0285921096802seconds
array access took 1.7474739551544seconds
As of PHP 8 you can now use str_ends_with()
$string = 'testers';
if (\str_ends_with($string, 's') {
// yes
}
Remember, if you have a string which was read as a line from a text file using the fgets() function, you need to use substr($string, -3, 1) so that you get the actual character and not part of the CRLF (Carriage Return Line Feed).
I don't think the person who asked the question needed this, but for me, I was having trouble getting that last character from a string from a text file so I'm sure others will come across similar problems.
You can find last character using php many ways like substr() and mb_substr().
If you’re using multibyte character encodings like UTF-8, use mb_substr instead of substr
Here i can show you both example:
<?php
echo substr("testers", -1);
echo mb_substr("testers", -1);
?>
LIVE DEMO
A string in different languages including C sharp and PHP is also considered an array of characters.
Knowing that in theory array operations should be faster than string ones you could do,
$foo = "bar";
$lastChar = strlen($foo) -1;
echo $foo[$lastChar];
$firstChar = 0;
echo $foo[$firstChar];
However, standard array functions like
count();
will not work on a string.
Use substr() with a negative number for the 2nd argument.$newstring = substr($string1, -1);
Siemano, get only php files from selected directory:
$dir = '/home/zetdoa/ftp/domeny/MY_DOMAIN/projekty/project';
$files = scandir($dir, 1);
foreach($files as $file){
$n = substr($file, -3);
if($n == 'php'){
echo $file.'<br />';
}
}
$html = file_get_contents("1.html");
eval("print \"" . addcslashes(preg_replace("/(---(.+?)---)/", "\\2", $html), '"') . "\";");
This searches an string and replaces ---$variable--- with $variable.
How can I rewrite the script so that it searches for ---$_SESSION['variable']--- and replaces with $_SESSION['variable']?
You could just change the replacement to:
preg_replace("/(---\\\$_SESSION\\['(.+?)'\\]---)/", "\${\$_SESSION['\\2']}", $html)
but I wouldn't at all recommend it. As always, eval is a big clue you're doing something wrong.
Non-templating uses of $ in 1.html or the session variable will cause errors. Arbitrary code in 1.html or the session variable can be executed via the ${...} syntax, potentially compromising your server. Less-than signs or ampersands in the session variable will be output as-is, leading to cross-site-scripting attacks.
A better strategy is to keep the string as just a string, not a PHP command. Find the ---...--- sections and replace those separately:
$parts= preg_split('/---(.+?)---/', $html, null, PREG_SPLIT_DELIM_CAPTURE);
for ($i= 1; $i<count($parts); $i+= 2) {
$part= trim($parts[$i]);
if (strpos($part, "\$_SESSION['")==0) {
$key= stripcslashes(substr($part, 11, -2));
$parts[$i]= htmlspecialchars($_SESSION[$key], ENT_QUOTES);
}
}
$html= implode('', $parts);
(Not tested, but should be along the right lines. You may not want htmlspecialchars if you really want your variables to contain active HTML; this is not usually the case.)
The function you need is preg_quote(). But before I post any code here: Are you really really really sure your $html or your $_SESSION['variable'] contains no malicious strings like $(cat /etc/passwd)? If you are, double-check. If you still are, go ahead using this:
preg_replace("/(---" . preg_quote($_SESSION['variable'], '/') . "---)/", "\\2", $html)