Strip out HTML and Exact Special Characters - php

With the help of friends I've got exact answer for removal of HTML codes and special characters ( Question No.7128856 ~ Thanks to Mez ) and here it was the answer
$des = "Hello world)<b> (*&^%$##! it's me: and; love you.<p>";
we would remove HTML codes and special characters so by using
// Strip HTML Tags
$clear = strip_tags($des);
// Clean up things like &
$clear = html_entity_decode($clear);
// Strip out any url-encoded stuff
$clear = urldecode($clear);
// Replace non-AlNum with space
$clear = preg_replace('/[A-Za-z0-9]/', ' ', $clear);
// Replace Multiple spaces with single space
$clear = preg_replace('/ +/', ' ', $clear);
// Trim the string of leading/trailing space
$clear = trim($clear);
we will get the
Now I'd like to change the question little bit !
I'd like to remove HTML codes and certain exact special characters
such as ( ) * & ^ % $ # # ! ~ _ - + ' " { [ } ] and so on and only allow all types of letters of whatever even Arabic,Russian,Hebraic ...etc
i found the 1st answer is good but it pass only English alpha-numeric letters Aa-Zz-90 but what for other language such as arabic عربى it will consider it as special characters ! and will remove it so my idea is how to define exact special characters !
Thanks
! Can we edit the answer of Mez by define which only special characters we remove it
For who asking Why ! Cause i'm willing to convert some titles to pure SEO links that is why i'll remove special characters but needs to allow for all languages in same time

Use:
$clear = mb_ereg_replace( '#[^0-9a-z]#i', '', $clear );

Related

sanitize string using whitelist regex php

I want to sanitize a $string using the next white list:
It includes a-z, A-Z,0-9 and some usual characters included on posts []=+-¿?¡!<>$%^&*'"()/##*,.:;_|.
As well spanish accents like á,é,í,ó,ú and ÁÉÍÓÚ
WHITE LIST
abcdefghijklmnñopqrstuvwxyzñáéíóúABCDEFGHIJKLMNÑOPQRSTUVWXYZÁÉÍÓÚ0123456789[]=+-¿?¡!<>$%^&*'"()/##*,.:;_|
I want to sanitize this string
$string="//abcdefghijklmnñopqrstuvwxyzñáéíóúABCDEFGHIJKLMNÑOPQRSTUVWXYZÁÉÍÓÚ0123456789[]=+-¿?¡!<>$%^&*'()/##*,.:;_| |||||||||| ] ¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶¸¹º»¼½ mmmmm onload onclick='' [ ? / < ~ # ` ! # $ % ^ & * ( ) + = } | : ; ' , > { space !#$%&'()*+,-./:;<=>?#[\]^_`{|}~ <html>sdsd</html> ** *`` `` ´´ {} {}[] ````... ;;,,´'¡'!!!!¿?ña ñaña ÑA á é´´ è ´ 8i ó ú à à` à è`ì`ò ù & > < ksks < wksdsd '' \" \' <script>alert('hi')</script>";
I tried this regex but it doesnt work
//$regex = '/[^\w\[\]\=\+\-\¿\?\¡\!\<\>\$\%\^\&\*\'\"\(\)\/\#\#\*\,\.\/\:\;\_\|]/i';
//preg_replace($regex, '', $string);
Does anyone has a clue how to sanitize thisstring according to the whitelist values?
If you known your white list characters use the white list in the regex instead of including the black list. The blacklist could be really big. Specially if the encoding something like UTF-8 or UTF-16
There is a lot of ways to do this. One could be to create a regex with capture groups of the desired range of posibilities (also include the spaces and new lines) and compose a new string with the groups.
Also take carefully that some of the characters could be reserved regex characters and need to be scaped. Like "[ ? +"
You could test a regex like:
$string ="Your test string";
$pattern= "([a-zA-Z0-9\[\]=\+\-\¿\?¡!<>$%\^&\*'\"\sñÑáéíóúÁÉÍÓÚ]+)";
preg_match_all($pattern, $string, $matches);
$newString = join('', $matches);
This is only and simple example of how to apply the whilte list with the regex.

PHP - Remove excess Whitespace but not new lines

i was looking for a way to remove excess whitespaces from within a string (that is, if 2 or more spaces are next each other, leave only 1 and remove the others), i found this Remove excess whitespace from within a string and i wanted to use this solution:
$foo = preg_replace( '/\s+/', ' ', $foo );
but this removes new lines aswell, while i want to keep them.
Is there any way to keep newlines while removing excess whitespace?
http://www.php.net/manual/en/regexp.reference.escape.php
defines \h
any horizontal whitespace character (since PHP 5.2.4)
so probably you are looking for
$foo = preg_replace( '/\h+/', ' ', $foo );
example: http://ideone.com/NcOiKW
If some of your symbols were converted to � after preg_replace (for example, Cyrillic capital letter R / Р), use mb_ereg_replace instead of preg_replace:
$value = mb_ereg_replace('/\h+/', ' ', $value);
if you want to remove excess of only-spaces (not tabs, new-lines, etc) you could use HEX code to be more specific:
$text = preg_replace('/\x20+/', ' ', $text);

PHP Code Snippet - Preg Replace to ensure only 1 whitespace between words in a string

I am using preg_replace to strip unwanted characters from a string. Thats working fine.
What i would to additionally have is just to modify that ( or in the next line ) so that i can allow max 1 whitespace in between words. No whitespace in front or behind the string.
I have seen the reply at
PHP preg_match needed to ensure only ONE space character is allowed between words where a possible solution was posted
( solution -> ^[a-zA-Z]+( [a-zA-Z]+)*$) but with some questions raised ( what is the definition of the word? what type of whitespace )
My Word definition -> english words ( including a-z,A-Z,0-9,- , _ )
Whitespace -> spacebar whitespace ( &nbsp) , not newlines,etc
Any code snippets?
Thanx,
Meswara
Try this:
<?php
$my_str = 'Hello, too many spaces!';
$my_str = preg_replace('#[\s]+#', ' ', $my_str);
echo $my_str; // returns 'Hello, too many spaces!'

Replace selected characters in PHP string

I know this question has been asked several times for sure, but I have my problems with regular expressions... So here is the (simple) thing I want to do in PHP:
I want to make a function which replaces unwanted characters of strings. Accepted characters should be:
a-z A-Z 0-9 _ - + ( ) { } # äöü ÄÖÜ space
I want all other characters to change to a "_". Here is some sample code, but I don't know what to fill in for the ?????:
<?php
// sample strings
$string1 = 'abd92 s_öse';
$string2 = 'ab! sd$ls_o';
// Replace unwanted chars in string by _
$string1 = preg_replace(?????, '_', $string1);
$string2 = preg_replace(?????, '_', $string2);
?>
Output should be:
$string1: abd92 s_öse (the same)
$string2: ab_ sd_ls_o
I was able to make it work for a-z, 0-9 but it would be nice to allow those additional characters, especially äöü. Thanks for your input!
To allow only the exact characters you described:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ -]/", "_", $str);
To allow all whitespace, not just the (space) character:
$str = preg_replace("/[^a-zA-Z0-9_+(){}#äöüÄÖÜ\s-]/", "_", $str);
To allow letters from different alphabets -- not just the specific ones you mentioned, but also things like Russian and Greek, or other types of accent marks:
$str = preg_replace("/[^\w+(){}#\s-]/", "_", $str);
If I were you, I'd go with the last one. Not only is it shorter and easier to read, but it's less restrictive, and there's no particular advantage to blocking stuff like и if äöüÄÖÜ are all fine.
Replace [^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ] with _.
$string1 = preg_replace('/[^a-zA-Z0-9_\-+(){}#äöüÄÖÜ ]/', '_', $string1);
This replaces any characters except the ones after ^ in the [character set]
Edit: escaped the - dash.

How do I replace tabs with spaces within variables in PHP?

$data contains tabs, leading spaces and multiple spaces. I wish to replace all tabs with a space. Multiple spaces with one single space, and remove leading spaces.
In fact somthing that would turn this input data:
[ asdf asdf asdf asdf ]
Into output data:
[asdf asdf asdf asdf]
How do I do this?
Trim, replace tabs and extra spaces with single spaces:
$data = preg_replace('/[ ]{2,}|[\t]/', ' ', trim($data));
$data = trim(preg_replace('/\s+/g', '', $data));
Assuming the square brackets aren't part of the string and you're just using them for illustrative purposes, then:
$new_string = trim(preg_replace('!\s+!', ' ', $old_string));
You might be able to do that with a single regex but it'll be a fairly complicated regex. The above is much more straightforward.
Note: I'm also assuming you don't want to replace "AB\t\tCD" (\t is a tab) with "AB CD".
$data = trim($data);
That gets rid of your leading (and trailing) spaces.
$pattern = '/\s+/';
$data = preg_replace($pattern, ' ', $data);
That turns any collection of one or more spaces into just one space.
$data = str_replace("\t", " ", $data);
That gets rid of your tabs.
$new_data = preg_replace("/[\t\s]+/", " ", trim($data));
This answer takes the question completely literally: it is only concerned with spaces and tabs. Granted, the OP probably also wants to include other kinds of whitespace in what gets trimmed/compressed, but let's pretend he wants to preserve embedded CR and/or LF.
First, let's set up some constants. This will allow for both ease of understanding and maintainability, should modifications become necessary. I put in some extra spaces so that you can compare the similarities and differences more easily.
define( 'S', '[ \t]+' ); # Stuff you want to compress; in this case ONLY spaces/tabs
define( 'L', '/\A'.S.'/' ); # stuff on the Left edge will be trimmed
define( 'M', '/'.S.'/' ); # stuff in the Middle will be compressed
define( 'R', '/'.S.'\Z/' ); # stuff on the Right edge will be trimmed
define( 'T', ' ' ); # what we want the stuff compressed To
We are using \A and \Z escape characters to specify the beginning and end of the subject, instead of the typical ^ and $ which are line-oriented meta-characters. This is not so much because they are needed in this instance as much as "defensive" programming, should the value of S change to make them needed in the future.
Now for the secret sauce: we are going to take advantage of some special semantics of preg_replace, namely (emphasis added)
If there are fewer elements in the replacement array than in the pattern array, any extra patterns will be replaced by an empty string.
function trim_press( $data ){
return preg_replace( [ M, L, R ], [ T ], $data );
}
So instead of a pattern string and replacement string, we are using a pattern array and replacement array, which results in the extra patterns L and R being trimmed.
In case you need to remove too.
$data = trim(preg_replace('/\s+|nbsp;/g', '', $data));
After much frustration I found this to be the best solution, as it also removes non breaking spaces which can be two characters long:
$data = html_entity_decode(str_replace(' ',' ',htmlentities($data)));
$data = trim(preg_replace('/\h/', ' ', $data)); // replaces more space character types than \s
See billynoah
Just use this regex
$str = trim(preg_replace('/\s\s+/', ' ', $str));
it will replace all tabs and spaces by one space,
here sign + in regex means one or more times,
pattern means, that wherever there are two or more spaces, replace it by one space

Categories