I have an xml document that gets loaded onto a page. Sometimes there are specific characters that cannot be parsed and shows this symbol in place of what should be there: –
Sometimes the character varies from a hyphen, to an apostrophe, to even a double quote.
What I'd like to do is, create an array:
$invalidCharacters = array(" – ", "’", "&");
and if the string contains any of those characters, replace them with their HTML/ASCII equivalent. like this: " – ", "'", and &.
I know that I can do a str_replace() on some items, but, is there a simple way to have it go trough a loop and look for the specific characters, replacing each as it goes?
Using htmlspecialchars should work for you.
http://docs.php.net/manual/en/function.htmlspecialchars.php
Related
I am trying to encode a phrase in order to pass it inside a URL. Currently it works fine with basic words, where spaces are replaces with dashes.
<a href="./'.str_replace(' ', '-', preg_replace("/[^A-Za-z0-9- ]/", '', $phrase)).'">
It produces something like:
/this-is-my-phase
On the page that this URL takes me I am able to replace the dashes with spaces and query my db for this phrase.
The problem I have is if the phrase contains apostrophe. My current script removes it. Is there any way to preserve it or replace with some URL-friendly character to accommodate something like?
this is bob's page
There is a PHP standard library function urlencode() to encode non-alphanumeric characters with %Xxx where xx is the hex value of the character.
If the limitations of that conversion (&, ©, £, etc.), are not acceptable, see rawurlencode().
If you want to allow another character , you have to add it to this section: ^A-Za-z0-9- so if for example you wish to allow ' the regex will be [^A-Za-z0-9-' ]
If you only need to replace all the apostrophes ('), then you can replace it with the URL-encoded character %27:
str_replace("'", "%20", $url);
EDIT
If you want to replace all URL-non-safe character, use a built-in function like in #wallyk's answer. It's much simpler.
I want to make a hyphen-separated string (for use in the URL) based on the user-submitted title of the post.
Suppose if the user entered the title of the post as:
$title = "USA is going to deport indians -- Breaking News / News India";
I want to convert it as below
$slug = usa-is-going-to-deport-indians-breaking-news-news-india";
There could be some more characters that I also want to be converted. For Example '&' to 'and' and '#', '%', to hyphen(-).
One of the ways that I tried was to use the str_replace() function, but with this method I have to call str_replace() too many times and it is time consuming.
One more problem is there could be more than one hyphen (-) in the title string, I want to convert more than one hyphens (-) to one hyphen(-).
Is there any robust and efficient way to solve this problem?
You can use preg_replace function to do this :
Input :
$string = "USA is going to deport indians -- Breaking News / News India";
$string = preg_replace("/[^\w]+/", "-", $string);
echo strtolower($string);
Output :
usa-is-going-to-deport-indians-breaking-news-news-india
I would suggest using the sanitize_title() function
check the documentation
There are three steps in this task (creating a "slug" string); each requires a separate pass over the input string.
Cast all characters to lowercase.
Replace ampersand symbols with [space]and[space] to ensure that the symbol is not consumed by a later replacement AND the replacement "and" is not prepended or appended to its neighboring words.
Replace sequences of one or more non-alphanumeric characters with a literal hyphen.
Multibyte-safe Code: (Demo)
$title = "ÛŞÃ is going to dèport 80% öf indians&citizens are #concerned -- Breaking News / News India";
echo preg_replace(
'/[^\pL\pN]+/u',
'-',
str_replace(
'&',
' and ',
mb_strtolower($title)
)
);
Output:
ûşã-is-going-to-dèport-80-öf-indians-and-citizens-are-concerned-breaking-news-news-india
Note that the replacement in str_replace() could be done within the preg_replace() call by forming an array of find strings and an array of replacement strings. However, this may be false economy -- although there would be fewer function calls, the more expensive regex-based function call would make two passes over the entire string.
If you wish to convert accented characters to ASCII characters, then perhaps read the different techniques at Convert accented characters to their plain ascii equivalents.
If you aren't worries about multibyte characters, then the simpler version of the same approach would be:
echo preg_replace(
'/[^a-z\d]+/',
'-',
str_replace(
'&',
' and ',
strtolower($title)
)
);
To mop up any leading or trailing hyphens in the result string, it may be a good idea to unconditionally call trim($resultstring, '-'). Demo
For a deeper dive on the subject of creating a slug string, read PHP function to make slug (URL string).
I am currently using what appears to be a horribly complex and unnecessary solution to form a required string.
The string could have any punctuation and will include slashes.
As an example, this string:
Test Ripple, it\'s a comic book one!
Using my current method:
str_replace(" ", "-", trim(preg_replace('/[^a-z0-9]+/i', ' ', str_replace("'", "", stripslashes($string)))))
Returns the correct result:
Test-Ripple-its-a-comic-book-one
Here is a breakdown of what my current (poor) solution is doing in order to achieve the desired output:-
Strip all slashes from the string
remove any apostrophes with str_replace
remove any remaining punctuation using preg_replace and replace it with whitespace
Trim off any extra whitespace from the beginning/end of string which may have been caused by punctuation.
Replace all whitespace with '-'
But there must be a better and more efficient way. Can anyone help?
Personally it looks fine to me however I would make one small change.
Change
preg_replace("/[^a-z0-9]+/i"
to the following
preg_replace("/[^a-zA-Z0-9\s]/"
I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!
Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com
You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.
I have a PHP page which gets text from an outside source wrapped in quotation marks. How do I strip them off?
For example:
input: "This is a text"
output: This is a text
Please answer with full PHP coding rather than just the regex...
This will work quite nicely unless you have strings with multiple quotes like """hello""" as input and you want to preserve all but the outermost "'s:
$output = trim($input, '"');
trim strips all of certain characters from the beginning and end of a string in the charlist that is passed in as a second argument (in this case just "). If you don't pass in a second argument it trims whitespace.
If the situation of multiple leading and ending quotes is an issue you can use:
$output = preg_replace('/^"|"$/', '', $input);
Which replaces only one leading or trailing quote with the empty string, such that:
""This is a text"" becomes "This is a text"
$output = str_replace('"', '', $input);
Of course, this will remove all quotation marks, even from inside the strings. Is this what you want? How many strings like this are there?
The question was on how to do it with a regex (maybe for curiosity/learning purposes).
This is how you would do that in php:
$result = preg_replace('/(")(.*?)(")/i', '$2', $subject);
Hope this helps,
Buckley