PHP - Sanitize string removes numbers - php

I am attempting to define allowed characters in an array, and then sanitize strings based on this array. The below code works pretty good except that it removes chars 0-9 too!
Could someone please explain why this is?
Code:
<?php
//Allowed characters within user data:
$symbols = array();
$symbols += range('a', 'z');
$symbols += range('A', 'Z');
$symbols += range('0', '9');
array_push($symbols,' ','-'); // Allow spaces and hyphens.
//----test 1
//data to test.
$someString = "07mm04dd1776yyyy";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/i", "", $someString));
echo "$someString\n";
//----test 2
$someString = "Another-07/04/1776-test-!##$%^&*()[]\\;',./\"[]|;\"<>?";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/i", "", $someString));
echo "$someString\n";
?>
Output:
mmddyyyy
Another--test-
Sidenote (edit): This is used in conjunction with a database but it goes beyond the DB, the data in the DB is used to write powershell scripts which import users into Active Directory, and many characters are not allowed, plus the old system only allowed these characters also.
Thank you in advance,
Wayne

Going off of what #andrewsi said with the allowed chars not being added to the array, I figured out how to add them properly. The below code shows they are added, and the outputs of the test strings.
There's probably a better way to do this, so I added it to the community wiki.
<?php
//Allowed characters within user data:
$symbols = array();
array_push($symbols,implode("",range('0', '9')));
array_push($symbols,implode("",range('a', 'z')));
array_push($symbols,implode("",range('A', 'Z')));
array_push($symbols,' ','-'); // Allow spaces and hyphens.
print_r($symbols);
echo "\n";
//----test 1
//data to test.
$someString = "07mm04dd1776yyyy";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/", "", $someString));
echo "$someString\n";
//----test 2
$someString = "Another-07/04/1776-test-!##$%^&*()[]\\;',./\"[]|;\"<>?";
//sanatize
$someString = trim(preg_replace("/[^" . preg_quote(implode('',$symbols), '/') . "]/", "", $someString));
echo "$someString\n";
?>
Output:
Array
(
[0] => 0123456789
[1] => abcdefghijklmnopqrstuvwxyz
[2] => ABCDEFGHIJKLMNOPQRSTUVWXYZ
[3] =>
[4] => -
)
07mm04dd1776yyyy
Another-07041776-test-

Related

Search and replace all lines in a multiline string

I have a string with a large list with items named as follows:
str = "f05cmdi-test1-name1
f06dmdi-test2-name2";
So the first 4 characters are random characters. And I would like to have an output like this:
'mdi-test1-name1',
'mdi-test2-name2',
As you can see the first characters from the string needs to be replaced with a ' and every line needs to end with ',
How can I change the above string into the string below? I've tried for ours with 'strstr' and 'str_replace' but I can't get it working. It would save me a lot of time if I got it work.
Thanks for your help guys!
Here is a way to do the job:
$input = "f05cmdi-test1-name1
f05cmdi-test2-name2";
$result = preg_replace("/.{4}(\S+)/", "'$1',", $input);
echo $result;
Where \S stands for a NON space character.
EDIT : I deleted the above since the following method is better and more reliable and can be used for any possible combination of four characters.
So what do I do if there are a million different possibillites as starting characters ?
In your specific example I see that the only space is in between the full strings (full string = "f05cmdi-test1-name1" )
So:
str = "f05cmdi-test1-name1 f06dmdi-test2-name2";
$result_array = [];
// Split at the spaces
$result = explode(" ", $str);
foreach($result as $item) {
// If four random chars take string after the first four random chars
$item = substr($item, 5);
$result_array = array_push($result_arrray, $item);
}
Resulting in:
$result_array = [
"mdi-test1-name1",
"mdi-test2-name2",
"....."
];
IF you would like a single string in the style of :
"'mdi-test1-name1','mdi-test2-name2','...'"
Then you can simply do the following:
$result_final = "'" . implode("','" , $result_array) . "'";
This is doable in a rather simple regex pattern
<?php
$str = "f05cmdi-test1-name1
f05cmdi-test2-name2";
$str = preg_replace("~[a-z0-9]{1,4}mdi-test([0-9]+-[a-z0-9]+)~", "'mdi-test\\1',", $str);
echo $str;
Alter to your more specific needs

PHP Regex for a specific numeric value inside a comma-delimited integer number string

I am trying to get the integer on the left and right for an input from the $str variable using REGEX. But I keep getting the commas back along with the integer. I only want integers not the commas. I have also tried replacing the wildcard . with \d but still no resolution.
$str = "1,2,3,4,5,6";
function pagination()
{
global $str;
// Using number 4 as an input from the string
preg_match('/(.{2})(4)(.{2})/', $str, $matches);
echo $matches[0]."\n".$matches[1]."\n".$matches[1]."\n".$matches[1]."\n";
}
pagination();
How about using a CSV parser?
$str = "1,2,3,4,5,6";
$line = str_getcsv($str);
$target = 4;
foreach($line as $key => $value) {
if($value == $target) {
echo $line[($key-1)] . '<--low high-->' . $line[($key+1)];
}
}
Output:
3<--low high-->5
or a regex could be
$str = "1,2,3,4,5,6";
preg_match('/(\d+),4,(\d+)/', $str, $matches);
echo $matches[1]."<--low high->".$matches[2];
Output:
3<--low high->5
The only flaw with these approaches is if the number is the start or end of range. Would that ever be the case?
I believe you're looking for Regex Non Capture Group
Here's what I did:
$regStr = "1,2,3,4,5,6";
$regex = "/(\d)(?:,)(4)(?:,)(\d)/";
preg_match($regex, $regStr, $results);
print_r($results);
Gives me the results:
Array ( [0] => 3,4,5 [1] => 3 [2] => 4 [3] => 5 )
Hope this helps!
Given your function name I am going to assume you need this for pagination.
The following solution might be easier:
$str = "1,2,3,4,5,6,7,8,9,10";
$str_parts = explode(',', $str);
// reset and end return the first and last element of an array respectively
$start = reset($str_parts);
$end = end($str_parts);
This prevents your regex from having to deal with your numbers getting into the double digits.

PHP word censor with keeping the original caps

We want to censor certain words on our site but each word has different censored output.
For example:
PHP => P*P, javascript => j*vascript
(However not always the second letter.)
So we want a simple "one star" censor system but with keeping the original caps. The datas coming from the database are uncensored so we need the fastest way that possible.
$data="Javascript and php are awesome!";
$word[]="PHP";
$censor[]="H";//the letter we want to replace
$word[]="javascript";
$censor[]="a"//but only once (j*v*script would look wierd)
//Of course if it needed we can use the full censored word in $censor variables
Expected value:
J*vascript and p*p are awesome!
Thanks for all the answers!
You can put your censored words in key-based array, and value of the array should be the position of what char is replaced with * (see $censor array example bellow).
$string = 'JavaSCRIPT and pHp are testing test-ground for TEST ŠĐČĆŽ ŠĐčćŽ!';
$censor = [
'php' => 2,
'javascript' => 2,
'test' => 3,
'šđčćž' => 4,
];
function stringCensorSlow($string, array $censor) {
foreach ($censor as $word => $position) {
while (($pos = mb_stripos($string, $word)) !== false) {
$string =
mb_substr($string, 0, $pos + $position - 1) .
'*' .
mb_substr($string, $pos + $position);
}
}
return $string;
}
function stringCensorFast($string, array $censor) {
$pattern = [];
foreach ($censor as $word => $position) {
$word = '~(' . mb_substr($word, 0, $position - 1) . ')' . mb_substr($word, $position - 1, 1) . '(' . mb_substr($word, $position) . ')~iu';
$pattern[$word] = '$1*$2';
}
return preg_replace(array_keys($pattern), array_values($pattern), $string);
}
Use example :
echo stringCensorSlow($string, $censor);
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
echo stringCensorFast($string, $censor) . "\n";
# J*vaSCRIPT and p*p are te*ting te*t-ground for TE*T ŠĐČ*Ž ŠĐč*Ž!
Speed test :
foreach (['stringCensorSlow', 'stringCensorFast'] as $func) {
$time = microtime(true);
for ($i = 0; $i < 10000; $i++) {
$func($string, $censor);
}
$time = microtime(true) - $time;
echo "{$func}() took $time\n";
}
output on my localhost was :
stringCensorSlow() took 1.9752140045166
stringCensorFast() took 0.11587309837341
Upgrade #1: added multibyte character safe.
Upgrade #2: added example for preg_replace, which is faster than mb_substr. Tnx to AbsoluteƵERØ
Upgrade #3: added speed test loop and result on my local PC machine.
Make an array of words and replacements. This should be your fastest option in terms of processing, but a little more methodical to setup. Remember when you're setting up your patterns to use the i modifier to make each pattern case insensitive. You could ultimately pull these from a database into the arrays. I've hard-coded the arrays here for the example.
<!DOCTYPE html>
<html>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<?php
$word_to_alter = array(
'!(j)a(v)a(script)(s|ing|ed)?!i',
'!(p)h(p)!i',
'!(m)y(sql)!i',
'!(p)(yth)o(n)!i',
'!(r)u(by)!i',
'!(ВЗЛ)О(М)!iu',
);
$alteration = array(
'$1*$2*$3$4',
'$1*$2',
'$1*$2',
'$1$2*$3',
'$1*$2',
'$1*$2',
);
$string = "Welcome to the world of programming. You can learn PHP, MySQL, Python, Ruby, and Javascript all at your own pace. If you know someone who uses javascripting in their daily routine you can ask them about becoming a programmer who writes JavaScripts. взлом прохладно";
$newstring = preg_replace($word_to_alter,$alteration,$string);
echo $newstring;
?>
</html>
Output
Welcome to the world of programming. You can learn P*P, M*SQL, Pyth*n,
R*by, and J*v*script all at your own pace. If you know someone who
uses j*v*scripting in their daily routine you can ask them about
becoming a programmer who writes J*v*Scripts. взл*м прохладно
Update
It works the same with UTF-8 characters, note that you have to specify a u modifier to make the pattern treated as UTF-8.
u (PCRE_UTF8)
This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This
modifier is available from PHP 4.1.0 or greater on Unix and from PHP
4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.
Why not just use a little helper function and pass it a word and the desired censor?
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
echo censorWord("Javascript", "a"); // returns J*avascript
echo censorWord("PHP", "H"); // returns P*P
Then you can check the word against your wordlist and if it is a word that should be censored, you can pass it to the function. Then, you also always have the original word as well as the censored one to play with or put back in your sentence.
This would also make it easy to change the number of letters censored by just changing the offset in the preg_replace. All you have to do is keep an array of words, explode the sentence on spaces or something, and then check in_array. If it is in the array, send it to censorWord().
Demo
And here's a more complete example doing exactly what you said in the OP.
function censorWord($word, $censor) {
if(strpos($word, $censor)) {
return preg_replace("/$censor/",'*', $word, 1);
}
}
$word_list = ['php','javascript'];
$data = "Javascript and php are awesome!";
$words = explode(" ", $data);
// pass each word by reference so it can be modified inside our array
foreach($words as &$word) {
if(in_array(strtolower($word), $word_list)) {
// this just passes the second letter of the word
// as the $censor argument
$word = censorWord($word, $word[1]);
}
}
echo implode(" ", $words); // returns J*vascript and p*p are awesome!
Another Demo
You could store a lowercase list of the censored words somewhere, and if you're okay with starring the second letter every time, do something like this:
if (in_array(strtolower($word), $censored_words)) {
$word = substr($word, 0, 1) . "*" . substr($word, 2);
}
If you want to change the first occurrence of a letter, you could do something like:
$censored_words = array('javascript' => 'a', 'php' => 'h', 'ruby' => 'b');
$lword = strtolower($word);
if (in_array($lword, array_keys($censored_words))) {
$ind = strpos($lword, $censored_words[$lword]);
$word = substr($word, 0, $ind) . "*" . substr($word, $ind + 1);
}
This is what I would do:
Create a simple database (text file) and make a "table" of all your censored words and expected censored results. E.G.:
PHP --- P*P
javascript --- j*vascript
HTML --- HT*L
Write PHP code to compare the database information to your simple censored file. You will have to use array explode to create an array of only words. Something like this:
/* Opening database of censored words */
$filename = "/files/censored_words.txt";
$file = fopen( $filename, "r" );
if( $file == false )
{
echo ( "Error in opening file" );
exit();
}
/* Creating an array of words from string*/
$data = explode(" ", $data); // What was "Javascript and PHP are awesome!" has
// become "Javascript", "and", "PHP", "are",
// "awesome!". This is useful.
If your script finds matching words, replace the word in your data with the censored word from your list. You would have to delimit the file first by \r\n and finally by ---. (Or whatever you choose for separating your table with.)
Hope this helped!

Get the current + the next word in a string

this is what I try to get:
My longest text to test When I search for e.g. My I should get My longest
I tried it with this function to get first the complete length of the input and then I search for the ' ' to cut it.
$length = strripos($text, $input) + strlen($input)+2;
$stringpos = strripos($text, ' ', $length);
$newstring = substr($text, 0, strpos($text, ' ', $length));
But this only works first time and then it cuts after the current input, means
My lon is My longest and not My longest text.
How I must change this to get the right result, always getting the next word. Maybe I need a break, but I cannot find the right solution.
UPDATE
Here is my workaround till I find a better solution. As I said working with array functions does not work, since part words should work. So I extended my previous idea a bit. Basic idea is to differ between first time and the next. I improved the code a bit.
function get_title($input, $text) {
$length = strripos($text, $input) + strlen($input);
$stringpos = stripos($text, ' ', $length);
// Find next ' '
$stringpos2 = stripos($text, ' ', $stringpos+1);
if (!$stringpos) {
$newstring = $text;
} else if ($stringpos2) {
$newstring = substr($text, 0, $stringpos2);
} }
Not pretty, but hey it seems to work ^^. Anyway maybe someone of you have a better solution.
You can try using explode
$string = explode(" ", "My longest text to test");
$key = array_search("My", $string);
echo $string[$key] , " " , $string[$key + 1] ;
You can take i to the next level using case insensitive with preg_match_all
$string = "My longest text to test in my school that is very close to mY village" ;
var_dump(__search("My",$string));
Output
array
0 => string 'My longest' (length=10)
1 => string 'my school' (length=9)
2 => string 'mY village' (length=10)
Function used
function __search($search,$string)
{
$result = array();
preg_match_all('/' . preg_quote($search) . '\s+\w+/i', $string, $result);
return $result[0];
}
There are simpler ways to do that. String functions are useful if you don't want to look for something specific, but cut out a pre-defined length of something. Else use a regular expression:
preg_match('/My\s+\w+/', $string, $result);
print $result[0];
Here the My looks for the literal first word. And \s+ for some spaces. While \w+ matches word characters.
This adds some new syntax to learn. But less brittle than workarounds and lengthier string function code to accomplish the same.
An easy method would be to split it on whitespace and grab the current array index plus the next one:
// Word to search for:
$findme = "text";
// Using preg_split() to split on any amount of whitespace
// lowercasing the words, to make the search case-insensitive
$words = preg_split('/\s+/', "My longest text to test");
// Find the word in the array with array_search()
// calling strtolower() with array_map() to search case-insensitively
$idx = array_search(strtolower($findme), array_map('strtolower', $words));
if ($idx !== FALSE) {
// If found, print the word and the following word from the array
// as long as the following one exists.
echo $words[$idx];
if (isset($words[$idx + 1])) {
echo " " . $words[$idx + 1];
}
}
// Prints:
// "text to"

RegEx in PHP to extract components of nquad

I'm looking around for a RegEx that can help me parse an nquad file. An nquad file is a straight text file where each line represents a quad (s, p, o, c):
<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
<http://mysubject> <http://mypredicate2> <http://myobject2> <http://mycontext> .
The objects can also be literals (instead of uris), in which case they are enclosed with double quotes:
<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .
I'm looking for a regex that given one line of this file, which will give me back a php array in the following format:
[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "http://myobject"
[3] => "http://mycontext"
...or in the case where the double quotes are used for the object:
[0] => "http://mysubject"
[1] => "http://mypredicate"
[2] => "My Object"
[3] => "http://mycontext"
One final thing - in an ideal world, the regex will cater for the scenario there may be 1 or more spaces between the various components, e.g.
<http://mysubject> <http://mypredicate> "My object" <http://mycontext> .
I'm going to add another answer as an additional solution using only a regex and explode:
$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';
$delimeter = '---'; // Can't use space
$result = preg_replace('/<([^>]*)>\s+<([^>]*)>\s+(?:["<]){1}([^">]*)(?:[">]){1}\s+<([^>]*)>/i', '$1' . $delimeter . '$2' . $delimeter . '$3' . $delimeter . '$4', $line);
$array = explode( $delimeter, $result);
It seems this can be accomplished as follows (I do not know your character restrictions so it may not work specifically for your needs, but worked for your test cases):
$line = "<http://mysubject> <http://mypredicate> <http://myobject> <http://mycontext>";
$line2 = '<http://mysubject> <http://mypredicate> "My object" <http://mycontext>';
// Remove unnecessary whitespace between entries (change $line to $line2 for testing)
$delimeter = '---';
$result = preg_replace('/([">]){1}\s+(["<]){1}/i', '$1' . $delimeter . '$2', $line);
// Explode on our delimeter
$array = explode( $delimeter, $result);
foreach( $array as &$a)
{
// Replace the characters we don't want with nothing
$a = str_replace( array( '<', '.', '>', '"'), '', $a);
}
var_dump( $array);
This regular expression would help:
/(\S+?)\s+(\S+?)\s+(\S+?)\s+(\S+?)\s+\./
(s, p, o, c) values will be in $1, $2, $3, $4 variables.

Categories