Using preg_replace not working properly - php

I need to replace everything in a string that is not a word,space,comma,period,question mark,exclamation mark,asterisk or '. I'm trying to do it using preg_replace, but not getting the correct results:
$string = "i don't know if i can do this,.?!*!##$%^&()_+123|";
preg_replace("~(?![\w\s]+|[\,\.\?\!\*]+|'|)~", "", $string);
echo $string;
Result:
i don't know if i can do this,.?!!*##$%^&()_+123|
Need Result:
i don't know if i can do this,.?!*

I don't know if you're happy to call html_entity_decode first to convert that ' into an apostrophe. If you are, then probably the simplest way to achieve this is
// Convert HTML entities to characters
$string = html_entity_decode($string, ENT_QUOTES);
// Remove characters other than the specified list.
$string = preg_replace("~[^\w\s,.?!*']+~", "", $string);
// Convert characters back to HTML entities. This will convert the ' back to '
$string = htmlspecialchars($string, ENT_QUOTES);
If not, then you'll need to use some negative assertions to remove & when not followed by #, ; when not preceded by &#039, and so on.
$string = preg_replace("~[^\w\s,.?!*'&#;]+|&(?!#)|&#(?!039;)|(?<!&)#|(?<!&#039);~", "", $string);
The results are subtly different. The first block of code, when provided ", will convert it to " and then remove it from the string. The second block will remove & and ; and leave quot behind in the result.

Related

Remove non alphanumerical characters in a string

I tried using both lines below:
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
preg_replace("/[^[:alnum:][:space:]]/u", '', $string);
But, if $string has a single quote, it is replaced by "039" and I don't understand why:
don't
becomes
don039t
In don't Instead of single quote you are using '. In html view it display as single code. While replacing(preg_replace) spacial characters will remove and 039(from ') only remaining.
As already mentioned, your string is encoded with htmlentities.
Try:
preg_replace("/[^A-Za-z0-9 ]/", '', html_entity_decode($string, ENT_QUOTES));
See this example
Specify ENT_QUOTES to make it deal with quotes.
If you are still looking for a way to remove characters, without decoding your entities, you can try
preg_replace("/[^a-z0-9& ]|&(#[0-9]{2,3})?/i", '', $string);
Try with:
$exampleString = "Hi! Jo libertaire. This is working fine \"Yes absolutely fine\".Don't say its not working. you can try this string. :)";
$result= preg_replace("/[^A-Za-z0-9 ]/", '', $exampleString);
print $result;
I don't have a server for php so I tried this in an online editor and it's working fine.

Remove empty space and plus sign from the beginning of a string

I have a string that begins with an empty space and a + sign :
$s = ' +This is a string[...]';
I can't figure out how to remove the first + sign using PHP. I've tried ltrim, preg_replace with several patterns and with trying to escape the + sign, I've also tried substr and str_replace. None of them is removing the plus sign at the beginning of the string. Either it doesn't replace it or it remplace/remove the totality of the string. Any help will be highly appreciated!
Edit : After further investigation, it seems that it's not really a plus sign, it looks 100% like a + sign but I think it's not. Any ideas for how to decode/convert it?
Edit 2 : There's one white space before the + sign. I'm using get_the_excerpt Wordpress function to get the string.
Edit 3 : After successfully removing the empty space and the + with substr($s, 2);, Here's what I get now :
$s == '#43;This is a string[...]'
Wiki : I had to remove 6 characters, I've tried substr($s, 6); and it's working well now. Thanks for your help guys.
ltrim has second parameter
$s = ltrim($s,'+');
edit:
if it is not working it means that there is sth else at the beginning of that string, eg. white spaces. You can check it by using var_dump($s); which shows you exactly what you have there.
You can use explode like this:
$result = explode('+', $s)[0];
What this function actually does is, it removes the delimeter you specify as a first argument and breaks the string into smaller strings whenever that delimeter is found and places those strings in an array.
It's mostly used with multiple ocurrences of a certain delimeter but it will work in your case too.
For example:
$string = "This,is,a,string";
$results = explode(',', $string);
var_dump($results); //prints ['This', 'is', 'a', 'string' ]
So in your case since the plus sign appears ony once the result is in the zero index of the returned array (that contains only one element, your string obviously)
Here's a couple of different ways I can think of
str_replace
$string = str_replace('+', '', $string);
preg_replace
$string = preg_replace('/^\+/', '', $string);
ltrim
$string = ltrim($string, '+');
substr
$string = substr($string, 1);
try this
<?php
$s = '+This is a string';
echo ltrim($s,'+');
?>
You can use ltrim() or substr().
For example :
$output = ltrim($string, '+');
or you can use
$output = substr($string, 1);
You can remove multiple characters with trim. Perhaps you were not re-assigning the outcome of your trim function.
<?php
$s = ' +This is a string[...]';
$s = ltrim($s, '+ ');
print $s;
Outputs:
This is a string[...]
ltrim in the above example removes all spaces and addition characters from the left hand side of the original string.

How to parse characters in a single-quoted string?

To get a double quoted string (which I cannot change) correctly parsed I have to do following:
$string = '15 Rose Avenue\n Irlam\n Manchester';
$string = str_replace('\n', "\n", $string);
print nl2br($string); // demonstrates that the \n's are now linebreak characters
So far, so good.
But in my given string there are characters like \xC3\xA4. There are many characters like this (beginning with \x..)
How can I get them correctly parsed as shown above with the linebreak?
You can use
$str = stripcslashes($str);
You can escape a \ in single quotes:
$string = str_replace('\\n', "\n", $string);
But you're going to have a lot of potential replaces if you need to do \\xC3, etc.... best use a preg_replace_callback() with a function(callback) to translate them to bytes

Need to replace repeating string with single instance

Ok so I am taking a string, querying a database and then must provide a URL back to the page. There are multiple special characters in the input and I am stripping all special characters and spaces out using the following code and replacing with HTML "%25" so that my legacy system correctly searches for the value needed. What I need to do however is cut down the number of "%25" that show up.
My current code would replace something like
"Hello. / there Wilbur" with "Hello%25%25%25%25there%25Wilbur"
but I would like it to return
"Hello%25there%25Wilbur"
replacing multiples of the "%25" with only one instance
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9]/', '%25', $string); // Replaces special chars.
Just add a + after selecting a non-alphanumeric character.
$string = "Hello. / there Wilbur";
$string = str_replace(' ', '-', $string);
// Just add a '+'. It will remove one or more consecutive instances of illegal
// characters with '%25'
return preg_replace('/[^A-Za-z0-9]+/', '%25', $string);
Sample input: Hello. / there Wilbur
Sample output: Hello%25there%25Wilbur
This will work:
while (strpos('%25%25', $str) !== false)
$str = str_replace('%25%25', '%25', $str);
Or using a regexp:
preg_replace('#((?:\%25){2,})#', '%25', $string_to_replace_in)
No looping using a while, so the more consecutive '%25', the faster preg_replace is against a while.
Cf PHP doc:
http://fr2.php.net/manual/en/function.preg-replace.php

Remove excess whitespace from within a string

I receive a string from a database query, then I remove all HTML tags, carriage returns and newlines before I put it in a CSV file. Only thing is, I can't find a way to remove the excess white space from between the strings.
What would be the best way to remove the inner whitespace characters?
Not sure exactly what you want but here are two situations:
If you are just dealing with excess whitespace on the beginning or end of the string you can use trim(), ltrim() or rtrim() to remove it.
If you are dealing with extra spaces within a string consider a preg_replace of multiple whitespaces " "* with a single whitespace " ".
Example:
$foo = preg_replace('/\s+/', ' ', $foo);
$str = str_replace(' ','',$str);
Or, replace with underscore, & nbsp; etc etc.
none of other examples worked for me, so I've used this one:
trim(preg_replace('/[\t\n\r\s]+/', ' ', $text_to_clean_up))
this replaces all tabs, new lines, double spaces etc to simple 1 space.
$str = trim(preg_replace('/\s+/',' ', $str));
The above line of code will remove extra spaces, as well as leading and trailing spaces.
If you want to replace only multiple spaces in a string, for Example: "this string have lots of space . "
And you expect the answer to be
"this string have lots of space", you can use the following solution:
$strng = "this string have lots of space . ";
$strng = trim(preg_replace('/\s+/',' ', $strng));
echo $strng;
There are security flaws to using preg_replace(), if you get the payload from user input [or other untrusted sources]. PHP executes the regular expression with eval(). If the incoming string isn't properly sanitized, your application risks being subjected to code injection.
In my own application, instead of bothering sanitizing the input (and as I only deal with short strings), I instead made a slightly more processor intensive function, though which is secure, since it doesn't eval() anything.
function secureRip(string $str): string { /* Rips all whitespace securely. */
$arr = str_split($str, 1);
$retStr = '';
foreach ($arr as $char) {
$retStr .= trim($char);
}
return $retStr;
}
$str = preg_replace('/[\s]+/', ' ', $str);
You can use:
$str = trim(str_replace(" ", " ", $str));
This removes extra whitespaces from both sides of string and converts two spaces to one within the string. Note that this won't convert three or more spaces in a row to one!
Another way I can suggest is using implode and explode that is safer but totally not optimum!
$str = implode(" ", array_filter(explode(" ", $str)));
My suggestion is using a native for loop or using regex to do this kind of job.
To expand on Sandip’s answer, I had a bunch of strings showing up in the logs that were mis-coded in bit.ly. They meant to code just the URL but put a twitter handle and some other stuff after a space. It looked like this
? productID =26%20via%20#LFS
Normally, that would‘t be a problem, but I’m getting a lot of SQL injection attempts, so I redirect anything that isn’t a valid ID to a 404. I used the preg_replace method to make the invalid productID string into a valid productID.
$productID=preg_replace('/[\s]+.*/','',$productID);
I look for a space in the URL and then remove everything after it.
I wrote recently a simple function which removes excess white space from string without regular expression implode(' ', array_filter(explode(' ', $str))).
Laravel 9.7 intruduced the new Str::squish() method to remove extraneous whitespaces including extraneous white space between words: https://laravel.com/docs/9.x/helpers#method-str-squish
$str = "I am a PHP Developer";
$str_length = strlen($str);
$str_arr = str_split($str);
for ($i = 0; $i < $str_length; $i++) {
if (isset($str_arr[$i + 1]) && $str_arr[$i] == ' ' && $str_arr[$i] == $str_arr[$i + 1]) {
unset($str_arr[$i]);
}
else {
continue;
}
}
echo implode("", $str_arr);

Categories