PHP: explode but ignore escaped delimiter - php

I have a flatfile database and it is data seperated by delimiters.
I allow people to use the delimiter in their input but I make sure to escape it with a \ beforehand.
The problem is my explode() function still attempts to split the escaped delimiters, so how do I tell it to ignore them?

Use preg_split instead. By using a regex you can match a delimeter only if it is not preceded with a backslash.
Edit:
preg_split('~(?<!\\\)' . preg_quote($delimeter, '~') . '~', $text);

None of the solutions here correctly handle any number of escape characters, or they leave them in the output. Here's an alternative:
function separate($string, $separator = '|', $escape = '\\') {
if (strlen($separator) != 1 || strlen($escape) != 1) {
trigger_error(__FUNCTION__ . ' requires delimiters to be single characters.', E_USER_WARNING);
return;
}
$segments = [];
$string = (string) $string;
do {
$segment = '';
do {
$segment_length = strcspn($string, "$separator$escape");
if ($segment_length) {
$segment .= substr($string, 0, $segment_length);
}
if (strlen($string) <= $segment_length) {
$string = null;
break;
}
if ($escaped = $string[$segment_length] == $escape) {
$segment .= (string) substr($string, ++$segment_length, 1);
}
$string = (string) substr($string, ++$segment_length);
} while ($escaped);
$segments[] = $segment;
} while ($string !== null);
return $segments;
}
This will process a raw string like foo\|ba\r\\|baz| into foo|bar\, baz, and an empty string.
If you want to retain the escape character in the output, you will have to modify the function.
Note: this will have unpredictable behaviour if you're using mb function overloading.

Input Data
key1=val1;key2=val2start\;val2end;key3=val3\\;key4=val4\\\;key5=val5\\\\;key6=val6
REGEX
/(.*?[^\\](\\\\)*?);/
Example
<?php
$data="key1=val1;key2=val2start\\;val2end;key3=val3\\\\;key4=val4\\\\\\;key5=val5\\\\\\\\;key6=val6";
$regex='/(.*?[^\\\\](\\\\\\\\)*?);/';
preg_match_all($regex, $data.';', $matches);
print_r($matches[1]);
Output
Array
(
[0] => key1=val1
[1] => key2=val2start\;val2end
[2] => key3=val3\\
[3] => key4=val4\\\;key5=val5\\\\
[4] => key6=val6
)

You will find this solution more useful than using regex for large strings. I employ a stream to allow usage of fgetcsv, which is optimized for this sort of thing.
<?php
function escaped_explode($string,$delimit,$escape=NULL,$enclosure=NULL,$max_line_length=0){
$r=[];
$stream = fopen('php://memory','r+');
fwrite($stream, $string);
rewind($stream);
while (($data = fgetcsv($stream,$max_line_length,$delimit,$enclosure,$escape)) !== FALSE)
$r=array_merge($r,$data);
fclose($stream);
return $r;
}
?>
Usage:
$pipelined_values = escaped_explode($source,'|','\\');
This is convenient also because you have the option of using enclosures, such as quotes, instead of only escape characters. This is nice if you run into parsing someone's blobs of JSON values, or other syntax, as you can both enclose and escape.
$source= <<<JSON
'{ "key":"val", "n":0}',
'{ "key":"val", "n":1, "name": "French du\'Name" }',
'{ "key":"val", "n":2}'
JSON;
Can be interpreted
<?php
$objects=[];
$raw= escaped_explode($source, ',', '\\', "'");
foreach($raw as $r)
$objects[] = json_decode($r);
?>

Related

How to trim string from right in PHP?

I have a string example
this-is-the-example/exa
I want to trim /exa from the above line
$string1 = "this-is-the-example/exa";
$string2 = "/exa";
I am using rtrim($string1, $sting2)
But the output is this-is-the-exampl
I want to this-is-the-example as output.
Both string are dynamic and may have multiple occurrences within the string. But I only want to remove the last part. Also its not compulsory that the string2 has / in it. this may be normal string too. like a, abc too..
There are various approaches you can use for this:
With substr(DEMO):
function removeFromEnd($haystack, $needle)
{
$length = strlen($needle);
if(substr($haystack, -$length) === $needle)
{
$haystack = substr($haystack, 0, -$length);
}
return $haystack;
}
$trim = '/exa';
$str = 'this-is-the-example/exa';
var_dump(removeFromEnd($str, $trim));
With regex(DEMO):
$trim = '/exa';
$str = 'this-is-the-example/exa';
function removeFromEnd($haystack, $needle)
{
$needle = preg_quote($needle, '/');
$haystack = preg_replace("/$needle$/", '', $haystack);
return $haystack;
}
var_dump(removeFromEnd($str, $trim));
First explode the string, remove last element from exploded array using array_pop, then implode it back again with /.
$str = "this-is-the-example/exa";
if(strpos($str, '/') !== false)
{
$arr = explode('/', $str);
array_pop($arr);
$str = implode('/', $arr);
// output this-is-the-example
}
This will work event if you have multiple / in the URL and will remove last element only.
$str = "this-is-the-example/somevalue/exa";
if(strpos($str, '/') !== false)
{
$arr = explode('/', $str);
array_pop($arr);
$str = implode('/', $arr);
// output this-is-the-example
}
Say hi to strstr()
$str = 'this-is-the-example/exa';
$trim = '/exa';
$result = strstr($str, $trim, true);
echo $result;
You can use explode
<?php
$x = "this-is-the-example/exa";
$y = explode('/', $x);
echo $y[0];
the second parameter of rtrim is a character mask and not a string, your last "e" is trimed and that's normal.
COnsider using something else, regexp for example (preg_replace) to fit your needs
This keeps everything before "/" char :
$str = preg_replace('/^([^\/]*).*/','$1', 'this-is-the-example/exa');
This removes the last part.
$str = preg_replace('/^(.*)\/.*$/','$1', 'this-is-the-example/exa/mple');
Hope this helps. :)
Simply try this code:
<?php
$this_example = substr("this-is-the-example/exa", 0, -4);
echo "<br/>".$this_example; // returns "this-is-the-example"
?>
To allow for error handling, if the substring is not found in the search string ...
<?php
$myString = 'this-is-the-example/exa';
//[Edit: see comment below] use strrpos, not strpos, to find the LAST occurrence
$endPosition = strrpos($myString, '/exa');
// TodO; if endPosition === False then handle error, substring not found
$leftPart = substr($myString, 0, $endPosition);
echo($leftPart);
?>
outputs
this-is-the-example

Is it possible to convert "x.y.z" to "x[y][z]" using regexp?

What is the most efficient pattern to replace dots in dot-separated string to an array-like string e.g x.y.z -> x[y][z]
Here is my current code, but I guess there should be a shorter method using regexp.
function convert($input)
{
if (strpos($input, '.') === false) {
return $input;
}
$input = str_replace_first('.', '[', $input);
$input = str_replace('.', '][', $input);
return $input . ']';
}
In your particular case "an array-like string" can be easily obtained using preg_replace function:
$input = "x.d.dsaf.d2.d";
print_r(preg_replace("/\.([^.]+)/", "[$1]", $input)); // "x[d][dsaf][d2][d]"
From what I can understand from your question; "x.y.z" is a String and so should "x[y][z]" be, right?
If that is the case, you may want to give the following code snippet a try:
<?php
$dotSeparatedString = "x.y.z";
$arrayLikeString = "";
//HERE IS THE REGEX YOU ASKED FOR...
$arrayLikeString = str_replace(".", "", preg_replace("#(\.[a-z0-9]*[^.])#", "[$1]", $dotSeparatedString));
var_dump($arrayLikeString); //DUMPS: 'x[y][z]'
Hope it helps you, though....
Using a fairly simple preg_replace_callback() that simply returns a different replacement for the first occurrence of . compared to the other occurrences.
$in = "x.y.z";
function cb($matches) {
static $first = true;
if (!$first)
return '][';
$first = false;
return '[';
}
$out = preg_replace_callback('/(\.)/', 'cb', $in) . ((strpos('.', $in) !== false) ? ']' : ']');
var_dump($out);
The ternary append is to handle the case of no . to replace
already answered but you could simply explode on the period delimiter then reconstruct a string.
$in = 'x.y.z';
$array = explode('.', $in);
$out = '';
foreach ($array as $key => $part){
$out .= ($key) ? '[' . $part . ']' : $part;
}
echo $out;

Check if string contains words from array

I have a script below that detects for words in my word filter (an array), and determines whether a string is clean or not.
What I have below works well when the words are used with spacing. But ifiwritesomething without spaces, it doesn't detect.
How can I make it such that it searches the whole string instead of words? I tried removing the explode function but I got some errors...
$string = 'goodmorningnoobs';
$array = array("idiot","noob");
if(0 == count(array_intersect(array_map('strtolower', explode(' ', $string)), $array))){
echo"clean";
} else {
echo "unclean";
}
Can anyone help?
$clean = true;
foreach ( $array as $word ) {
if ( stripos($string, $word) !== false ) {
$clean = false;
break;
}
}
echo $clean ? 'clean' : 'unclean';
How about?
$hasWords = preg_match('/'. implode('|', $words) .'/', $string);
echo $hasWords ? 'unclean' : 'clean';

Convert lib_string to string w/o Regex

I need to convert lib_someString to someString inside a block of text using str_replace [not regex].
Here's an example to give an exact sense what I mean: lib_12345 => 12345. I need to do this for a bunch of instances in a block of text.
Below is my attempt. Problem I'm getting is that my function is not doing anything (I just get lib_id returned).
function extractLibId($val){ // function to get the "12345" in the above example
$lclRetVal = substr($val, 5, strlen($val));
return $lclRetVal;
}
function Lib($text){ // does the replace for all lib_ instances in the text
$lclVar = "lib_";
$text = str_replace($lclVar, "<a href='".extractLibId($lclVar)."'>".extractLibId($lclVar)."</a>", $text);
return $text;
}
Regexp gonna be faster and more clear, you will have no need to call your function for every possible 'lib_' string:
function Lib($text) {
$count = null;
return preg_replace('/lib_([0-9]+)/', '$1', $text, -1, $count);
}
$text = 'some text lib_123123 goes here lib_111';
$text = Lib($text);
Without regexp, but every time Lib2 will be called somewhere will die cute kitten:
function extractLibId($val) {
$lclRetVal = substr($val, 4);
return $lclRetVal;
}
function Lib2($text) {
$count = null;
while (($pos = strpos($text, 'lib_')) !== false) {
$end = $pos;
while (!in_array($text[$end], array(' ', ',', '.')) && $end < strlen($text))
$end++;
$sub = substr($text, $pos, $end - $pos);
$text = str_replace($sub, ''.extractLibId($sub).'', $text);
}
return $text;
}
$text = 'some text lib_123123 goes here lib_111';
$text = Lib2($text);
Use preg_replace.
Although it is possible to do what you need without regular expressions, you say you don't want to use them because of performance reasons. I doubt the other solution will be faster, so here is a simple regex to benchmark against:
echo preg_replace("/lib_(\w+)/", '$1', $str);
As shown here: http://codepad.org/xGj78r9r
Ignoring how ridiculous area of optimizing this is, even the simplest implementation with minimal validation already takes only 33% less time than a regex
<?php
function uselessFunction( $val ) {
if( strpos( $val, "lib_" ) !== 0 ) {
return $val;
}
$str = substr( $val, 4 );
return "{$str}";
}
$l = 100000;
$now = microtime(TRUE);
while( $l-- ) {
preg_replace( '/^lib_(.*)$/', "$1", 'lib_someString' );
}
echo (microtime(TRUE)-$now)."\n";
//0.191093
$l = 100000;
$now = microtime(TRUE);
while( $l-- ) {
uselessFunction( "lib_someString" );
}
echo (microtime(TRUE)-$now);
//0.127598
?>
If you're restricted from using a regex, you're going to have difficult time searching for a string you describe as "someString", i.e. not precisely known in advance. If you know the string is exactly lib_12345, for example, then set $lclVar to that string. On the other hand, if you don't know the exact string in advance, you'll have to use a regex via preg_replace() or a similar function.

remove a part of a URL argument string in php

I have a string in PHP that is a URI with all arguments:
$string = http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0
I want to completely remove an argument and return the remain string. For example I want to remove arg3 and end up with:
$string = http://domain.com/php/doc.php?arg1=0&arg2=1
I will always want to remove the same argument (arg3), and it may or not be the last argument.
Thoughts?
EDIT: there might be a bunch of wierd characters in arg3 so my prefered way to do this (in essence) would be:
$newstring = remove $_GET["arg3"] from $string;
There's no real reason to use regexes here, you can use string and array functions instead.
You can explode the part after the ? (which you can get using substr to get a substring and strrpos to get the position of the last ?) into an array, and use unset to remove arg3, and then join to put the string back together.:
$string = "http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0";
$pos = strrpos($string, "?"); // get the position of the last ? in the string
$query_string_parts = array();
foreach (explode("&", substr($string, $pos + 1)) as $q)
{
list($key, $val) = explode("=", $q);
if ($key != "arg3")
{
// keep track of the parts that don't have arg3 as the key
$query_string_parts[] = "$key=$val";
}
}
// rebuild the string
$result = substr($string, 0, $pos + 1) . join($query_string_parts);
See it in action at http://www.ideone.com/PrO0a
preg_replace("arg3=[^&]*(&|$)", "", $string)
I'm assuming the url itself won't contain arg3= here, which in a sane world should be a safe assumption.
$new = preg_replace('/&arg3=[^&]*/', '', $string);
This should also work, taking into account, for example, page anchors (#) and at least some of those "weird characters" you mention but don't seem worried about:
function remove_query_part($url, $term)
{
$query_str = parse_url($url, PHP_URL_QUERY);
if ($frag = parse_url($url, PHP_URL_FRAGMENT)) {
$frag = '#' . $frag;
}
parse_str($query_str, $query_arr);
unset($query_arr[$term]);
$new = '?' . http_build_query($query_arr) . $frag;
return str_replace(strstr($url, '?'), $new, $url);
}
Demo:
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0#frag';
$string[] = 'http://domain.com/php/doc.php?arg1=0&arg2=1&arg3=0&arg4=4';
$string[] = 'http://domain.com/php/doc.php';
$string[] = 'http://domain.com/php/doc.php#frag';
$string[] = 'http://example.com?arg1=question?mark&arg2=equal=sign&arg3=hello';
foreach ($string as $str) {
echo remove_query_part($str, 'arg3') . "\n";
}
Output:
http://domain.com/php/doc.php?arg1=0&arg2=1
http://domain.com/php/doc.php?arg1=0&arg2=1
http://domain.com/php/doc.php?arg1=0&arg2=1#frag
http://domain.com/php/doc.php?arg1=0&arg2=1&arg4=4
http://domain.com/php/doc.php
http://domain.com/php/doc.php#frag
http://example.com?arg1=question%3Fmark&arg2=equal%3Dsign
Tested only as shown.

Categories