Ok so I am taking a string, querying a database and then must provide a URL back to the page. There are multiple special characters in the input and I am stripping all special characters and spaces out using the following code and replacing with HTML "%25" so that my legacy system correctly searches for the value needed. What I need to do however is cut down the number of "%25" that show up.
My current code would replace something like
"Hello. / there Wilbur" with "Hello%25%25%25%25there%25Wilbur"
but I would like it to return
"Hello%25there%25Wilbur"
replacing multiples of the "%25" with only one instance
$string = str_replace(' ', '-', $string); // Replaces all spaces with hyphens.
return preg_replace('/[^A-Za-z0-9]/', '%25', $string); // Replaces special chars.
Just add a + after selecting a non-alphanumeric character.
$string = "Hello. / there Wilbur";
$string = str_replace(' ', '-', $string);
// Just add a '+'. It will remove one or more consecutive instances of illegal
// characters with '%25'
return preg_replace('/[^A-Za-z0-9]+/', '%25', $string);
Sample input: Hello. / there Wilbur
Sample output: Hello%25there%25Wilbur
This will work:
while (strpos('%25%25', $str) !== false)
$str = str_replace('%25%25', '%25', $str);
Or using a regexp:
preg_replace('#((?:\%25){2,})#', '%25', $string_to_replace_in)
No looping using a while, so the more consecutive '%25', the faster preg_replace is against a while.
Cf PHP doc:
http://fr2.php.net/manual/en/function.preg-replace.php
Related
I want to write a PHP function that keeps only a-z (keeps all letters as lowercase) 0-9 and "-", and replace spaces with "-".
Here is what I have so far:
...
$s = strtolower($s);
$s = str_replace(' ', '-', $s);
$s = preg_replace("/[^a-z0-9]\-/", "", $s);
But I noticed that it keeps "?" (question marks) and I'm hoping that it doesn't keep other characters that I haven't noticed.
How could I correct it to obtain the expected result?
(I'm not super comfortable with regular expressions, especially when switching languages/tools.)
$s = strtolower($s);
$s = str_replace(' ', '-', $s);
$s = preg_replace("/[^a-z0-9\-]+/", "", $s);
You did not have the \- in the [] brackets.
It also seems you can use - instead of \-, both worked for me.
You need to add multiplier of the searched characters.
In this case, I used +.
The plus sign indicates one or more occurrences of the preceding element.
I have the following string:
$thetextstring = "jjfnj 948"
At the end I want to have:
echo $thetextstring; // should print jjf-nj948
So basically what am trying to do is to join the separated string then separate the first 3 letters with a -.
So far I have
$string = trim(preg_replace('/s+/', ' ', $thetextstring));
$result = explode(" ", $thetextstring);
$newstring = implode('', $result);
print_r($newstring);
I have been able to join the words, but how do I add the separator after the first 3 letters?
Use a regex with preg_replace function, this would be a one-liner:
^.{3}\K([^\s]*) *
Breakdown:
^ # Assert start of string
.{3} # Match 3 characters
\K # Reset match
([^\s]*) * # Capture everything up to space character(s) then try to match them
PHP code:
echo preg_replace('~^.{3}\K([^\s]*) *~', '-$1', 'jjfnj 948');
PHP live demo
Without knowing more about how your strings can vary, this is working solution for your task:
Pattern:
~([a-z]{2}) ~ // 2 letters (contained in capture group1) followed by a space
Replace:
-$1
Demo Link
Code: (Demo)
$thetextstring = "jjfnj 948";
echo preg_replace('~([a-z]{2}) ~','-$1',$thetextstring);
Output:
jjf-nj948
Note this pattern can easily be expanded to include characters beyond lowercase letters that precede the space. ~(\S{2}) ~
You can use str_replace to remove the unwanted space:
$newString = str_replace(' ', '', $thetextstring);
$newString:
jjfnj948
And then preg_replace to put in the dash:
$final = preg_replace('/^([a-z]{3})/', '\1-', $newString);
The meaning of this regex instruction is:
from the beginning of the line: ^
capture three a-z characters: ([a-z]{3})
replace this match with itself followed by a dash: \1-
$final:
jjf-nj948
$thetextstring = "jjfnj 948";
// replace all spaces with nothing
$thetextstring = str_replace(" ", "", $thetextstring);
// insert a dash after the third character
$thetextstring = substr_replace($thetextstring, "-", 3, 0);
echo $thetextstring;
This gives the requested jjf-nj948
You proceeding is correct. For the last step, which consists in inserting a - after the third character, you can use the substr_replace function as follows:
$thetextstring = 'jjfnj 948';
$string = trim(preg_replace('/\s+/', ' ', $thetextstring));
$result = explode(' ', $thetextstring);
$newstring = substr_replace(implode('', $result), '-', 3, false);
If you are confident enough that your string will always have the same format (characters followed by a whitespace followed by numbers), you can also reduce your computations and simplify your code as follows:
$thetextstring = 'jjfnj 948';
$newstring = substr_replace(str_replace(' ', '', $thetextstring), '-', 3, false);
Visit this link for a working demo.
Oldschool without regex
$test = "jjfnj 948";
$test = str_replace(" ", "", $test); // strip all spaces from string
echo substr($test, 0, 3)."-".substr($test, 3); // isolate first three chars, add hyphen, and concat all characters after the first three
I have an insert query which adds various words into a search table, for use in a keyword search for my site, based on existing content from other tables.
My issue is that, although I have a common words text file which excludes words like 'and' and 'the', I also wish to eliminate numbers and words less than 3 characters in length.
Can anyone help?
$stripChars = array('.', ',', '!', '?', '(', ')', '%', '&', '"', '*', ':', ';', '#', ' - ', '/', '\\');
$string = str_replace($stripChars, ' ', $string);
$string = str_replace(' ', ' ', $string);
$words = explode(' ', $string);
return array_diff($words, $this->commonwords);
You can use this to remove words less than 3 characters:
$replaced = preg_replace('~\b[a-z]{1,2}\b\~', '', $text);
also use this to remove numbers:
$replaced = preg_replace('/[0-9]+/', '', $text);
You can do what you are trying to achieve with a structured Regex call, in PHP using the function preg_replace. However, looking at the code on your question there is a lot that can be improved simply by employing the correct Regex with the Preg_replace function:
$stripChars = array('.', ',', '!', '?', '(', ')', '%', '&', '"', '*', ':', ';', '#', ' - ', '/', '\\');
$string = str_replace($stripChars, ' ', $string);
Lets face it, this isn't very articulate to look at.
Assuming you're simply trying to remove non-alphanumeric characters this can be simplified down to:
$string = preg_replace("/[^a-z0-9_\s-]/i","",$string);
Which is telling PHP to replace all characters which are not (indicated by the ^ carat): a-z (the /i indicates case insensitive) and not 0-9 and not underscore _ and not a whitespace character \s or a dash -. These are then replaced with nothing (second string section) and so are effectively removed.
You can obviously adjust what appears in the square brackets to suit your needs (see later on as this will occur...).
Adding in to this your next section:
$string = str_replace(' ', ' ', $string);
Which appears to be you want to replace multiple spaces with a single space character, again, preg_replace can do this nice and concisely for you:
$string = preg_replace("/\s+/", " ",$string);
Where \s is the whitespace character, and the + sign indicates to return "greedy and as many as possible".
And your original request, which was for removing numbers and words of 2 or less characters, preg_replace can use the code from part 1 of this answer simply to include numbers as well, by omitting numbers from the [^a-z0-9_\s-] block, thus: [^a-z_\s-] numbers will now be removed.
To remove short words you can use:
$string = preg_replace("/\b[a-z]{1,2}\b/i","",$string);
This will outline words with a word boundary \b and then defined that any collection of those characters in the square brackets [a-z] of length between minimum 1 and maximum 2 {1,2} should be marked, and the \i makes it case insensitive again, thus removing these words.
Wrapping it all together you then have:
///remove anything that is not letters or underscore or whitespace
$string = preg_replace("/[^a-z_\s-]/i","",$string);
/// remove short words
$string = preg_replace("/\b[a-z]{1,2}\b/i","",$string);
/// finally remove excess whitespaces
$string = preg_replace("/\s+/", " ",$string);
The removal of whitespaces is put last as removing short words would leave the space each side of the word so thus causes longer whitespace blocks.
There may well be a way of combining the Regex into a single (or at least, fewer) query/ies, but I'm not very good at combining regex calls I'm afraid. But the code above is much smarter, neater and more powerful than your current code. As well as answering your question.
EDIT:
To remove just numbers specifically you can use the following preg_replace code:
$string = preg_replace("/\d+/","",$string);
I'm making a function that that detect and remove all trailing special characters from string. It can convert strings like :
"hello-world"
"hello-world/"
"hello-world--"
"hello-world/%--+..."
into "hello-world".
anyone knows the trick without writing a lot of codes?
Just for fun
[^a-z\s]+
Regex demo
Explanation:
[^x]: One character that is not x sample
\s: "whitespace character": space, tab, newline, carriage return, vertical tab sample
+: One or more sample
PHP:
$re = "/[^a-z\\s]+/i";
$str = "Hello world\nhello world/\nhello world--\nhellow world/%--+...";
$subst = "";
$result = preg_replace($re, $subst, $str);
try this
$string = preg_replace('/[^A-Za-z0-9\-]/', '', $string); // Removes special chars.
or escape apostraphe from string
preg_replace('/[^A-Za-z0-9\-\']/', '', $string); // escape apostraphe
You could use a regex like this, depending on your definition of "special characters":
function clean_string($input) {
return preg_replace('/\W+$/', '', $input);
}
It replaces any characters that are not a word character (\W) at the end of the string $ with nothing. \W will match [^a-zA-Z0-9_], so anything that is not a letter, digit, or underscore will get replaced. To specify which characters are special chars, use a regex like this, where you put all your special chars within the [] brackets:
function clean_string($input) {
return preg_replace('/[\/%.+-]+$/', '', $input);
}
This one is what you are looking for. :
([^\n\w\d \"]*)$
It removes anything that is not from the alphabet, a number, a space and a new line.
Just call it like this :
preg_replace('/([^\n\w\s]*)$/', '', $string);
Here is my regex to exclude special character other then allowing few like (-,%,:,#). I want to allow / also but getting issue
return preg_replace('/[^a-zA-Z0-9_ %\[\]\.\(\)%:#&-]/s', '', $string);
this works fine for listed special character, but
return preg_replace('/[^a-zA-Z0-9_ %\[\]\.\(\)%\\:&-]/s', '', $string);
does not filter l chracter to.
Here is the link to test:
http://ideone.com/WxR0ka
where it does not allow \\ in url. I want to dispaly URL as usual
You're making a mistake in entering http:// by http:\\ also your regex needs to include / in exclusion list. This should work:
function clean($string) {
// Replaces all spaces with hyphens.
$string = str_replace(' ', '-', $string);
// Removes special chars.
return preg_replace('~[^\w %\[\].()%\\:#&/-]~', '', $string);
}
$d = clean("this was http://nice readlly n'ice 'test for#me to") ;
echo $d; // this-was-http://nice-readlly-nice-test-for#me-to
Working Demo