PHP - a function to "sanitize" a string

PHP - a function to "sanitize" a string - php

is there any PHP function available that replaces spaces and underscores from a string with dashes?
Like:
Some Word
Some_Word
Some___Word
Some Word
Some ) # $ ^ Word
=> some-word
basically, the sanitized string should only contain a-z characters, numbers (0-9), and dashes (-).

This should produce the desired result:
$someword = strtolower(preg_replace("/[^a-z]+/i", "-", $theword));

<?php
function sanitize($s) {
// This RegEx removes any group of non-alphanumeric or dash
// character and replaces it/them with a dash
return strtolower(preg_replace('/[^a-z0-9-]+/i', '-', $s));
}
echo sanitize('Some Word') . "\n";
echo sanitize('Some_Word') . "\n";
echo sanitize('Some___Word') . "\n";
echo sanitize('Some Word') . "\n";
echo sanitize('Some ) # $ ^ Word') . "\n";
Output:
Some-Word
Some-Word
Some-Word
Some-Word
Some-Word

You might like to try preg_replace:
http://php.net/manual/en/function.preg-replace.php
Example from this page:
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
//April1,2003
?>
You might like to try a search for "search friendly URLs with PHP" as there is quite a bit of documentation, example:
function friendlyURL($string){
$string = preg_replace("`\[.*\]`U","",$string);
$string = preg_replace('`&(amp;)?#?[a-z0-9]+;`i','-',$string);
$string = htmlentities($string, ENT_COMPAT, 'utf-8');
$string = preg_replace( "`&([a-z])(acute|uml|circ|grave|ring|cedil|slash|tilde|caron|lig|quot|rsquo);`i","\\1", $string );
$string = preg_replace( array("`[^a-z0-9]`i","`[-]+`") , "-", $string);
return strtolower(trim($string, '-'));
}
and usage:
$myFriendlyURL = friendlyURL("Barca rejects FIFA statement on Olympics row");
echo $myFriendlyURL; // will echo barca-rejects-fifa-statement-on-olympics-row
Source: http://htmlblog.net/seo-friendly-url-in-php/

I found a few interesting solutions throughout the web.. note none of this is my code. Simply copied here in hopes of helping you build a custom function for your own app.
This has been copied from Chyrp. Should work well for your needs!
/**
* Function: sanitize
* Returns a sanitized string, typically for URLs.
*
* Parameters:
* $string - The string to sanitize.
* $force_lowercase - Force the string to lowercase?
* $anal - If set to *true*, will remove all non-alphanumeric characters.
*/
function sanitize($string, $force_lowercase = true, $anal = false) {
$strip = array("~", "`", "!", "#", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
"}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
"â€”", "â€“", ",", "<", ".", ">", "/", "?");
$clean = trim(str_replace($strip, "", strip_tags($string)));
$clean = preg_replace('/\s+/', "-", $clean);
$clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
return ($force_lowercase) ?
(function_exists('mb_strtolower')) ?
mb_strtolower($clean, 'UTF-8') :
strtolower($clean) :
$clean;
}
EDIT:
Even easier function I found! Just a few lines of code, fairly self-explanitory.
function slug($z){
$z = strtolower($z);
$z = preg_replace('/[^a-z0-9 -]+/', '', $z);
$z = str_replace(' ', '-', $z);
return trim($z, '-');
}

Not sure why #Dagon chose to leave a comment instead of an answer, but here's an expansion of his answer.
php's preg_replace function allows you to replace anything with anything else.
Here's an example for your case:
$input = "a word 435 (*^(*& HaHa";
$dashesOnly = preg_replace("#[^-a-zA-Z0-9]+#", "-", $input);
print $dashesOnly; // prints a-word-435-HaHa;

You can think of writing this piece of code with the help of regular expressions.
But I dont see any available functions which help you directly replace the " " with "-"

Related

Regular expression for finding multiple patterns from a given string

I am using regular expression for getting multiple patterns from a given string.
Here, I will explain you clearly.
$string = "about us";
$newtag = preg_replace("/ /", "_", $string);
print_r($newtag);
The above is my code.
Here, i am finding the space in a word and replacing the space with the special character what ever i need, right??
Now, I need a regular expression that gives me patterns like
about_us, about-us, aboutus as output if i give about us as input.
Is this possible to do.
Please help me in that.
Thanks in advance!

And finally, my answer is
$string = "contact_us";
$a = array('-','_',' ');
foreach($a as $b){
if(strpos($string,$b)){
$separators = array('-','_','',' ');
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/".$b."/", $sep, $string);
}
print_r($outputs);
}
}
exit;

You need to do a loop to handle multiple possible outputs :
$separators = array('-','_','');
$string = "about us";
$outputs = array();
foreach ($separators as $sep) {
$outputs[] = preg_replace("/ /", $sep, $string);
}
print_r($outputs);

You can try without regex:
$string = 'about us';
$specialChar = '-'; // or any other
$newtag = implode($specialChar, explode(' ', $string));
If you put special characters into an array:
$specialChars = array('_', '-', '');
$newtags = array();
foreach ($specialChars as $specialChar) {
$newtags[] = implode($specialChar, explode(' ', $string));
}
Also you can use just str_replace()
foreach ($specialChars as $specialChar) {
$newtags[] = str_replace(' ', $specialChar, $string);
}

Not knowing exactly what you want to do I expect that you might want to replace any occurrence of a non-word (1 or more times) with a single dash.
e.g.
preg_replace('/\W+/', '-', $string);

If you just want to replace the space, use \s
<?php
$string = "about us";
$replacewith = "_";
$newtag = preg_replace("/\s/", $replacewith, $string);
print_r($newtag);
?>

I am not sure that regexes are the good tool for that. However you can simply define this kind of function:
function rep($str) {
return array( strtr($str, ' ', '_'),
strtr($str, ' ', '-'),
str_replace(' ', '', $str) );
}
$result = rep('about us');
print_r($result);

Matches any character that is not a word character
$string = "about us";
$newtag = preg_replace("/(\W)/g", "_", $string);
print_r($newtag);
in case its just that... you would get problems if it's a longer string :)

PHP quick easy to replace characters?

I have a function that generates a hash and filters out characters:
$str = base64_encode(md5("mystring"));
$str = str_replace( "+", "_",
str_replace( "/", "-",
str_replace( "=", "x" $str
)));
What is the "right" way to do this in php?
i.e., is there a cleaner way?
// Let "tr()" be an imaginary function
$str = base64_encode(md5("mystring"));
$str = tr( "+/=", "_-x", $str );

There's a couple options here, first using str_replace properly:
$str = str_replace(array('+', '/', '='), array('_', '-', 'x'), $str);
And there's also the always-forgotten strtr:
$str = strtr($str, '+/=', '_-x');

You can use arrays in str_replace like this
$replace = Array('+', '/', '=');
$with = Array('_', '-', 'x');
$str = str_replace($replace, $with, $str);
Hope it helped

You can also use strtr with an array.
strtr('replace :this value', array(
':this' => 'that'
));

Making strings "URL safe" [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
URL Friendly Username in PHP?
Is there a way to make strings "URL safe" which means replacing whitespaces with hyphens, removing any punctuation and change all capital letters to lowercase?
For example:
"This is a STRING" -› "this-is-a-string"
or
"Hello World!" –› "hello-world"

You can use preg_replace to replace change those characters.
$safe = preg_replace('/^-+|-+$/', '', strtolower(preg_replace('/[^a-zA-Z0-9]+/', '-', $string)));

I Often use this function to generate my clean urls and seems to work fine,
You could alter it according to your needs but give it a try.
function sanitize($string, $force_lowercase = true, $anal = false) {
$strip = array("~", "`", "!", "#", "#", "$", "%", "^", "&", "*", "(", ")", "_", "=", "+", "[", "{", "]",
"}", "\\", "|", ";", ":", "\"", "'", "‘", "’", "“", "”", "–", "—",
"â€”", "â€“", ",", "<", ".", ">", "/", "?");
$clean = trim(str_replace($strip, "", strip_tags($string)));
$clean = preg_replace('/\s+/', "-", $clean);
$clean = ($anal) ? preg_replace("/[^a-zA-Z0-9]/", "", $clean) : $clean ;
return ($force_lowercase) ?
(function_exists('mb_strtolower')) ?
mb_strtolower($clean, 'UTF-8') :
strtolower($clean) :
$clean;
}

Combining multiple regular expressions into one

I'm filtering all user input to remove the following characters:
http://www.w3.org/TR/unicode-xml/#Charlist ("not suitable characters for use with markup").
So, I have this two functions:
if (!function_exists("mb_trim")) {
function mb_trim($str)
{
return preg_replace('/^[\pZ\pC]+|[\pZ\pC]+$/u', '', $str);
}
}
function sanitize($str)
{
// Clones of grave and accent
$str = preg_replace("/[\x{0340}-\x{0341}]+/u", "", $str);
// Obsolete characters for Khmer
$str = preg_replace("/[\x{17A3}]+/u", "", $str);
$str = preg_replace("/[\x{17D3}]+/u", "", $str);
// Line and paragraph separator
$str = preg_replace("/[\x{2028}]+/u", "", $str);
$str = preg_replace("/[\x{2029}]+/u", "", $str);
// BIDI embedding controls (LRE, RLE, LRO, RLO, PDF)
$str = preg_replace("/[\x{202A}-\x{202E}]+/u", "", $str);
// Activate/Inhibit Symmetric swapping
$str = preg_replace("/[\x{206A}-\x{206B}]+/u", "", $str);
// Activate/Inhibit Arabic from shaping
$str = preg_replace("/[\x{206C}-\x{206D}]+/u", "", $str);
// Activate/Inhibit National digit shapes
$str = preg_replace("/[\x{206E}-\x{206F}]+/u", "", $str);
// Interlinear annotation characters
$str = preg_replace("/[\x{FFF9}-\x{FFFB}]+/u", "", $str);
// Byte Order Mark
$str = preg_replace("/[\x{FEFF}]+/u", "", $str);
// Object replacement character
$str = preg_replace("/[\x{FFFC}]+/u", "", $str);
// Scoping for Musical Notation
$str = preg_replace("/[\x{1D173}-\x{1D17A}]+/u", "", $str);
$str = mb_trim($str);
if (mb_check_encoding($str)) {
return $str;
} else {
return false;
}
}
I have not much knowledge with regular expresions, so, what I want to know is
Is the mb_trim function correct for trimming multi-byte strings?
Is it possible to join all regular expresions in the function
sanitize to do only one preg_replace?
Thanks

You can do with one preg_replace by combining them into a one character set like so:
$str = preg_replace("/[\x{0340}-\x{0341}\x{17A3}\x{17D3}\x{2028}-\x{2029}\x{202A}-\x{202E}\x{206A}-\x{206B}\x{206C}-\x{206D}\x{206E}-\x{206F}\x{FFF9}-\x{FFFB}\x{FEFF}\x{FFFC}\x{1D173}-\x{1D17A}]+/u", "", $str);

Reversion Strings and replace a character - RegEx with Php

I have a doubt again on RegEx in Php.
Assume that I have a line like this
716/52 ; 250/491.1; 356/398; 382/144
I want the output to be
Replace all semi-colon with comma. I think I can do this using
$myline= str_replace(";", ",", $myline);
Interchange the numbers and replace '/' with a comma. That is, 716/52 will become 52,716. This is where I get stuck.
So, the output should be
52,716 , 491.1,250, 398,356, 144,382
I know that using sed, I can achieve it as
1,$s/^classcode:[\t ]\+\([0-9]\+\)\/\([0-9]\+\)/classcode: \2\,\1/
But, how do I do it using preg_match in php?

$str = '716/52 ; 250/491.1; 356/398; 382/144';
$str = str_replace(';', ',', $str);
$res = preg_replace_callback('~[\d.]+/[\d.]+~', 'reverse', $str);
function reverse($matches)
{
$parts = explode('/', $matches[0]);
return $parts[1] . ',' . $parts[0];
}
var_dump($res);
And working sample: http://ideone.com/BeS9j
UPD: PHP 5.3 version with anonymous functions
$str = '716/52 ; 250/491.1; 356/398; 382/144';
$str = str_replace(';', ',', $str);
$res = preg_replace_callback('~[\d.]+/[\d.]+~', function ($matches) {
$parts = explode('/', $matches[0]);
return $parts[1] . ',' . $parts[0];
}, $str);
var_dump($res);

As an alternative to Regexen you could try this:
echo join(', ', array_map(
function ($s) { return join(',', array_reverse(explode('/', trim($s)))); },
explode(';', $string)));

$str = '716/52 ; 250/491.1; 356/398; 382/144';
$str = preg_replace('(\d+(?:\.\d+)?)\/(\d+(?:\.\d+)?)', '$2,$1', $str);
$str = str_replace(';', ',', $str);
Uses two capture groups, replacing them in reverse order. See it here.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - a function to "sanitize" a string - php

is there any PHP function available that replaces spaces and underscores from a string with dashes? Like: Some Word Some_Word Some___Word Some Word Some ) # $ ^ Word => some-word basically, the sanitized string should only contain a-z characters, numbers (0-9), and dashes (-).

This should produce the desired result: $someword = strtolower(preg_replace("/[^a-z]+/i", "-", $theword));

You can think of writing this piece of code with the help of regular expressions. But I dont see any available functions which help you directly replace the " " with "-"

Related

Regular expression for finding multiple patterns from a given string

PHP quick easy to replace characters?

Making strings "URL safe" [duplicate]

Combining multiple regular expressions into one

Reversion Strings and replace a character - RegEx with Php

Categories

Resources