Does somebody know a quick and easy explode() like function that can ignore splitter characters that are enclosed in a pair of arbitrary characters (e.g. quotes)?
Example:
my_explode(
"/",
"This is/a string/that should be/exploded.//But 'not/here',/and 'not/here'"
);
should result in an array with the following members:
This is
a string
that should be
exploded.
But 'not/here',
and 'not/here'
the fact that the characters are wrapped in single quotes would spare them from being splitters.
Bonus points for a solution that can deal with two wrapper characters
(not/here)
A native PHP solution would be preferred, but I don't think such a thing exists!
str_getcsv($str, '/')
There's a recipe for <5.3 on the linked page.
This is near-impossible with preg_split, because you can't tell from the middle of the string whether you're between quotes or not. However, preg_match_all can do the job.
Simple solution for a single type of quote:
function quoted_explode($subject, $delimiter = ',', $quote = '\'') {
$regex = "(?:[^$delimiter$quote]|[$quote][^$quote]*[$quote])+";
preg_match_all('/'.str_replace('/', '\\/', $regex).'/', $subject, $matches);
return $matches[0];
}
That function will have all kinds of problems if you pass it certain special characters (\^-], according to http://www.regular-expressions.info/reference.html), so you'll need to escape those. Here's a general solution that escapes special regex characters and can track multiple kinds of quotes separately:
function regex_escape($subject) {
return str_replace(array('\\', '^', '-', ']'), array('\\\\', '\\^', '\\-', '\\]'), $subject);
}
function quoted_explode($subject, $delimiters = ',', $quotes = '\'') {
$clauses[] = '[^'.regex_escape($delimiters.$quotes).']';
foreach(str_split($quotes) as $quote) {
$quote = regex_escape($quote);
$clauses[] = "[$quote][^$quote]*[$quote]";
}
$regex = '(?:'.implode('|', $clauses).')+';
preg_match_all('/'.str_replace('/', '\\/', $regex).'/', $subject, $matches);
return $matches[0];
}
(Note that I keep all of the variables between square brackets to minimize what needs escaping - outside of square brackets, there are about twice as many special characters.)
If you wanted to use ] as a quote, then you probably wanted to use [ as the corresponding quote, but I'll leave adding that functionality as an exercise for the reader. :)
Something very near with preg_split : http://fr2.php.net/manual/en/function.preg-split.php#92632
It handles multiple wrapper characters AND multiple delimiter characters.
Related
I have a string in PHP
$string = "Dogs are Jonny's favorite pet";
I want to use regex or some method to remove s or 's from the end of all words in the string.
The desired output would be:
$revisedString = "Dog are Jonny favorite pet";
Here is my current approach:
<?php
$string = "Dogs are Jonny's favorite pet";
$stringWords = explode(" ", $string);
$counter = 0;
foreach($stringWords as $string) {
if(substr($string, -1) == s){
$stringWords[$counter] = trim($string, "s");
}
if(strpos($string, "'s") !== false){
$stringWords[$counter] = trim($string, "'s");
}
$counter = $counter + 1;
}
print_r($stringWords);
$newString = "";
foreach($stringWords as $string){
$newString = $newString . $string . " ";
}
echo $newString;
}
?>
How would this be achieved with REGEX?
For general use, you must leverage much more sophisticated technique than an English-ignorant regex pattern. There may be fringe cases where the following pattern fails by removing an s that it shouldn't. It could be a name, an acronym, or something else.
As an unreliable solution, you can optionally match an apostrophe then match a literal s if it is not immediately preceded by another s. Adding a word boundary (\b) on the end improves the accuracy that you are matching the end of words.
Code: (Demo)
$string = "The bass can access the river's delta from the ocean. The fishermen, assassins, and their friends are happy on the banks";
var_export(preg_replace("~'?(?<!s)s\b~", '', $string));
Output:
'The bass can access the river delta from the ocean. The fishermen, assassin, and their friend are happy on the bank'
PHP Live Regex always helped me a lot in such moments. Even already knowing how REGEX works, I still use it just to be sure some times.
To make use of REGEX in your case, you can use preg_replace().
<?php
// Your string.
$string = "Dogs are Jonny's favorite pet";
// The vertical bar means "or" and the backslash
// before the apostrophe is needed so you don't end
// your pattern string since we're using single quotes
// to delimit it. "\s" means a single space.
$regex_pattern = '/\'s\s|s\s|s$/';
// Fill the preg_replace() with the pattern, the replacement
// (a single space in this case), your string, -1 (so preg_replace()
// will replace all the matches) and a variable of your desire
// to be the "counter" (preg_replace() will automatically
// fill it).
$newString = preg_replace($regex_pattern, ' ', $string, -1, $counter);
// Use the rtrim() to remove spaces at the right of the sentence.
$newString = rtrim($newString, " ");
echo "New string: " . $newString . ". ";
echo "Replacements: " . $counter . ".";
?>
In this case, the function will identify any "'s" or "s" with spaces (\s) after them and then replace them with a single space.
The preg_replace() will also count all the replacements and register them automatically on $counter or any variable you place there instead.
Edit:
Phil's comment is right and indeed my previous REGEX would lose a "s" at the end of the string. Adding "|s$" will solve it. Again, "|" means "or" and the "$" means that the "s" must be at the end of the string.
In attention to mickmackusa's comment, my solution is meant only to remove "s" characters at the end of words inside the string as this was Sparky Johnson' request here. Removing plurals would require a complex code since not only we need to remove "s" characters from plural only words but also change verbs and other stuff.
So, I'm doing some manipulation on lat/long pairs, and I need to turn this:
39.1889375383777,-94.48019109594397
into:
39.1889375383777 -94.48019109594397
I can't use str_replace, unless I want to have an array of 10 search and 10 replace strings, so I was hoping to use preg_replace:
$query1 = preg_replace( "/([0-9-]),([0-9-])/", "\1 \2", $query );
The problem is that the "-" gets lost:
39.1889375383777 94.48019109594397
Note, that I have a string containing a list of these, trying to do all at once:
[[39.1889375383777,-94.48019109594397],[39.18425796890108,-94.28288005131176],[39.41972019529712,-94.19956344733345],[39.41412315915102,-94.41932608390658],[39.34785744845041,-94.4893603307242],[39.1889375383777,-94.48019109594397]]
I managed to make this work with preg_replace_callback:
$str = preg_replace_callback( "/([0-9-]),([0-9-])/",
function ($matches) {return $matches[1] . " " . $matches[2];},
$query
);
But still not sure why the simpler preg_match didn't work?
Your main issue is that "\1 \2" define a "\x1\x20\x2" string, where the first character is a SOH char and the third one is STX char (see the ASCII table). To define backreferences, you need to use a literal backslash, "\\", or, better, use $n notation, and better inside a single-quoted string literal.
You can also use a solution without backreferences:
preg_replace('~(?<=\d),(?=-?\d)~', ' ', $str)
Details:
(?<=\d) - a location that is immediately preceded with a digit
, - a comma
(?=-?\d) - a location that is immediately followed with an optional - and a digit.
See the PHP demo:
$str = '[[39.1889375383777,-94.48019109594397],[39.18425796890108,-94.28288005131176],[39.41972019529712,-94.19956344733345],[39.41412315915102,-94.41932608390658],[39.34785744845041,-94.4893603307242],[39.1889375383777,-94.48019109594397]]';
echo preg_replace('~(?<=\d),(?=-?\d)~', ' ', $str);
// => [[39.1889375383777 -94.48019109594397],[39.18425796890108 -94.28288005131176],[39.41972019529712 -94.19956344733345],[39.41412315915102 -94.41932608390658],[39.34785744845041 -94.4893603307242],[39.1889375383777 -94.48019109594397]]
This is the code:
<?php
$pattern =' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
$text_split = str_split($text,1);
$data = '';
foreach($text_split as $value){
if (preg_match("/".$value."/", $pattern )){
$data = $data.$value;
}
if (!preg_match('/'.$value.'/', $pattern )){
break;
}
}
echo $data;
?>
Current output:
kdaiuyq7e611422^^$^vbnvcn^vznbsjhf
Expected output:
kdaiuyq7e611422
Please help me editing my code error. In pattern there is no ^ or $. But preg_match is showing matched which is doubtful.
You string $text have ^ which will match the begin of the string $pattern.
So the preg_match('/^/', $pattern) will return true, then the ^ will append to $data.
You should escape the ^ as a raw char, not a special char with preg_match('/\^/', $pattern) by the help of preg_quote() which will escape the special char.
There is no need to split your string up like that, the whole point of a regular expression is you can specify all the conditions within the expression. You can condense your entire code down to this:
$pattern = '/^[[:word:] ]+/';
$text = 'kdaiuyq7e611422^^$^vbnvcn^vznbsjhf';
preg_match($pattern, $text, $matches);
echo $matches[0];
Kris has accurately isolated that escaping in your method is the monkey wrench. This can be solved with preg_quote() or wrapping pattern characters in \Q ... \E (force characters to be interpreted literally).
Slapping that bandaid on your method (as you have done while answering your own question) doesn't help you to see what you should be doing.
I recommend that you do away with the character mask, the str_split(), and the looped calls of preg_match(). Your task can be accomplished far more briefly/efficiently/directly with a single preg_match() call. Here is the clean way that obeys your character mask fully:
Code: (Demo)
$text = "kdaiuyq7e611422^^$^vbnvcn^vznbsjhf";
echo preg_match('/^[a-z\d ]+/i',$text,$out)?$out[0]:'No Match';
Output:
kdaiuyq7e611422
miknik's method was close to this, but it did not maintain 100% accuracy given your question requirements. I'll explain:
[:word:] is a POSIX Character Class (functioning like \w) that represents letters(uppercase and lowercase), numbers, and an underscore. Unfortunately for miknik, the underscore is not in your list of wanted characters, so this renders the pattern slightly inaccurate and may be untrustworthy for your project.
How can I remove a new line character from a string using PHP?
$string = str_replace(PHP_EOL, '', $string);
or
$string = str_replace(array("\n","\r"), '', $string);
$string = str_replace("\n", "", $string);
$string = str_replace("\r", "", $string);
To remove several new lines it's recommended to use a regular expression:
$my_string = trim(preg_replace('/\s\s+/', ' ', $my_string));
Better to use,
$string = str_replace(array("\n","\r\n","\r"), '', $string).
Because some line breaks remains as it is from textarea input.
Something a bit more functional (easy to use anywhere):
function strip_carriage_returns($string)
{
return str_replace(array("\n\r", "\n", "\r"), '', $string);
}
stripcslashes should suffice (removes \r\n etc.)
$str = stripcslashes($str);
Returns a string with backslashes stripped off. Recognizes C-like \n,
\r ..., octal and hexadecimal representation.
Try this out. It's working for me.
First remove n from the string (use double slash before n).
Then remove r from string like n
Code:
$string = str_replace("\\n", $string);
$string = str_replace("\\r", $string);
Let's see a performance test!
Things have changed since I last answered this question, so here's a little test I created. I compared the four most promising methods, preg_replace vs. strtr vs. str_replace, and strtr goes twice because it has a single character and an array-to-array mode.
You can run the test here:
https://deneskellner.com/stackoverflow-examples/1991198/
Results
251.84 ticks using preg_replace("/[\r\n]+/"," ",$text);
81.04 ticks using strtr($text,["\r"=>"","\n"=>""]);
11.65 ticks using str_replace($text,["\r","\n"],["",""])
4.65 ticks using strtr($text,"\r\n"," ")
(Note that it's a realtime test and server loads may change, so you'll probably get different figures.)
The preg_replace solution is noticeably slower, but that's okay. They do a different job and PHP has no prepared regex, so it's parsing the expression every single time. It's simply not fair to expect them to win.
On the other hand, in line 2-3, str_replace and strtr are doing almost the same job and they perform quite differently. They deal with arrays, and they do exactly what we told them - remove the newlines, replacing them with nothing.
The last one is a dirty trick: it replaces characters with characters, that is, newlines with spaces. It's even faster, and it makes sense because when you get rid of line breaks, you probably don't want to concatenate the word at the end of one line with the first word of the next. So it's not exactly what the OP described, but it's clearly the fastest. With long strings and many replacements, the difference will grow because character substitutions are linear by nature.
Verdict: str_replace wins in general
And if you can afford to have spaces instead of [\r\n], use strtr with characters. It works twice as fast in the average case and probably a lot faster when there are many short lines.
Use:
function removeP($text) {
$key = 0;
$newText = "";
while ($key < strlen($text)) {
if(ord($text[$key]) == 9 or
ord($text[$key]) == 10) {
//$newText .= '<br>'; // Uncomment this if you want <br> to replace that spacial characters;
}
else {
$newText .= $text[$key];
}
// echo $k . "'" . $t[$k] . "'=" . ord($t[$k]) . "<br>";
$key++;
}
return $newText;
}
$myvar = removeP("your string");
Note: Here I am not using PHP regex, but still you can remove the newline character.
This will remove all newline characters which are not removed from by preg_replace, str_replace or trim functions
I receive a string from a database query, then I remove all HTML tags, carriage returns and newlines before I put it in a CSV file. Only thing is, I can't find a way to remove the excess white space from between the strings.
What would be the best way to remove the inner whitespace characters?
Not sure exactly what you want but here are two situations:
If you are just dealing with excess whitespace on the beginning or end of the string you can use trim(), ltrim() or rtrim() to remove it.
If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " ".
Example:
$foo = preg_replace('/\s+/', ' ', $foo);
$str = str_replace(' ','',$str);
Or, replace with underscore, & nbsp; etc etc.
none of other examples worked for me, so I've used this one:
trim(preg_replace('/[\t\n\r\s]+/', ' ', $text_to_clean_up))
this replaces all tabs, new lines, double spaces etc to simple 1 space.
$str = trim(preg_replace('/\s+/',' ', $str));
The above line of code will remove extra spaces, as well as leading and trailing spaces.
If you want to replace only multiple spaces in a string, for Example: "this string have lots of space . "
And you expect the answer to be
"this string have lots of space", you can use the following solution:
$strng = "this string have lots of space . ";
$strng = trim(preg_replace('/\s+/',' ', $strng));
echo $strng;
There are security flaws to using preg_replace(), if you get the payload from user input [or other untrusted sources]. PHP executes the regular expression with eval(). If the incoming string isn't properly sanitized, your application risks being subjected to code injection.
In my own application, instead of bothering sanitizing the input (and as I only deal with short strings), I instead made a slightly more processor intensive function, though which is secure, since it doesn't eval() anything.
function secureRip(string $str): string { /* Rips all whitespace securely. */
$arr = str_split($str, 1);
$retStr = '';
foreach ($arr as $char) {
$retStr .= trim($char);
}
return $retStr;
}
$str = preg_replace('/[\s]+/', ' ', $str);
You can use:
$str = trim(str_replace(" ", " ", $str));
This removes extra whitespaces from both sides of string and converts two spaces to one within the string. Note that this won't convert three or more spaces in a row to one!
Another way I can suggest is using implode and explode that is safer but totally not optimum!
$str = implode(" ", array_filter(explode(" ", $str)));
My suggestion is using a native for loop or using regex to do this kind of job.
To expand on Sandip’s answer, I had a bunch of strings showing up in the logs that were mis-coded in bit.ly. They meant to code just the URL but put a twitter handle and some other stuff after a space. It looked like this
? productID =26%20via%20#LFS
Normally, that would‘t be a problem, but I’m getting a lot of SQL injection attempts, so I redirect anything that isn’t a valid ID to a 404. I used the preg_replace method to make the invalid productID string into a valid productID.
$productID=preg_replace('/[\s]+.*/','',$productID);
I look for a space in the URL and then remove everything after it.
I wrote recently a simple function which removes excess white space from string without regular expression implode(' ', array_filter(explode(' ', $str))).
Laravel 9.7 intruduced the new Str::squish() method to remove extraneous whitespaces including extraneous white space between words: https://laravel.com/docs/9.x/helpers#method-str-squish
$str = "I am a PHP Developer";
$str_length = strlen($str);
$str_arr = str_split($str);
for ($i = 0; $i < $str_length; $i++) {
if (isset($str_arr[$i + 1]) && $str_arr[$i] == ' ' && $str_arr[$i] == $str_arr[$i + 1]) {
unset($str_arr[$i]);
}
else {
continue;
}
}
echo implode("", $str_arr);