I am trying to encode a phrase in order to pass it inside a URL. Currently it works fine with basic words, where spaces are replaces with dashes.
<a href="./'.str_replace(' ', '-', preg_replace("/[^A-Za-z0-9- ]/", '', $phrase)).'">
It produces something like:
/this-is-my-phase
On the page that this URL takes me I am able to replace the dashes with spaces and query my db for this phrase.
The problem I have is if the phrase contains apostrophe. My current script removes it. Is there any way to preserve it or replace with some URL-friendly character to accommodate something like?
this is bob's page
There is a PHP standard library function urlencode() to encode non-alphanumeric characters with %Xxx where xx is the hex value of the character.
If the limitations of that conversion (&, ©, £, etc.), are not acceptable, see rawurlencode().
If you want to allow another character , you have to add it to this section: ^A-Za-z0-9- so if for example you wish to allow ' the regex will be [^A-Za-z0-9-' ]
If you only need to replace all the apostrophes ('), then you can replace it with the URL-encoded character %27:
str_replace("'", "%20", $url);
EDIT
If you want to replace all URL-non-safe character, use a built-in function like in #wallyk's answer. It's much simpler.
Related
I tend to lose track when I'm dealing with alot of special characters.
I have some urls (badly formatted, not consistent enough to use parse_url), and I want to replace all occurrences of the parameters ?dead and/or ?dead=some_text_here with nothing.
There might be other variables before and after.
Example urls:
http://www.url.com/?dead?dead=whatever_text&wow=test
http://www.url.com/?hello?dead=whatever_text
This is what I thought would work, but it doesnt.
$parsed_url = preg_replace("/(\?dead(?:=.*?)?)(?:\&|$|\?)/", "", $url);
What its supposed to do is check for "?dead", with an optional =value behind, then replace that with nothing. But this is also replacing the ? and &, if there is a parameter following the ?dead parameter. Also its only replacing 1 occurrence, not all.
It makes
http://www.url.com/?dead?dead=whatever_text&wow=test
Become
http://www.url.com/dead=whatever_text&wow=test
I think you want something like this pattern?
(\?dead(=[^&]*|))*
PHP Code:
echo preg_replace('/(\?dead(=[^&]*|))*/','',$sourcestring);
This will produce this output of your given urls:
http://www.url.com/&wow=test
http://www.url.com/&wow=test
http://www.url.com/?hello
You can use \Q and \E (as in QuotE) when dealing with lots of special characters.
The text between these delimiters will be treated literally.
How about:
$parsed_url = preg_replace("/\?dead(?:=[^&?]*)?/", "", $url);
preg_replace should match all occurances by default (the $limit parameter is -1 by default).
This regex is very similar to yours but instead of .* it uses [^&?]* to match until the next ? or &
I have an xml document that gets loaded onto a page. Sometimes there are specific characters that cannot be parsed and shows this symbol in place of what should be there: –
Sometimes the character varies from a hyphen, to an apostrophe, to even a double quote.
What I'd like to do is, create an array:
$invalidCharacters = array(" – ", "’", "&");
and if the string contains any of those characters, replace them with their HTML/ASCII equivalent. like this: " – ", "'", and &.
I know that I can do a str_replace() on some items, but, is there a simple way to have it go trough a loop and look for the specific characters, replacing each as it goes?
Using htmlspecialchars should work for you.
http://docs.php.net/manual/en/function.htmlspecialchars.php
I want to make a hyphen-separated string (for use in the URL) based on the user-submitted title of the post.
Suppose if the user entered the title of the post as:
$title = "USA is going to deport indians -- Breaking News / News India";
I want to convert it as below
$slug = usa-is-going-to-deport-indians-breaking-news-news-india";
There could be some more characters that I also want to be converted. For Example '&' to 'and' and '#', '%', to hyphen(-).
One of the ways that I tried was to use the str_replace() function, but with this method I have to call str_replace() too many times and it is time consuming.
One more problem is there could be more than one hyphen (-) in the title string, I want to convert more than one hyphens (-) to one hyphen(-).
Is there any robust and efficient way to solve this problem?
You can use preg_replace function to do this :
Input :
$string = "USA is going to deport indians -- Breaking News / News India";
$string = preg_replace("/[^\w]+/", "-", $string);
echo strtolower($string);
Output :
usa-is-going-to-deport-indians-breaking-news-news-india
I would suggest using the sanitize_title() function
check the documentation
There are three steps in this task (creating a "slug" string); each requires a separate pass over the input string.
Cast all characters to lowercase.
Replace ampersand symbols with [space]and[space] to ensure that the symbol is not consumed by a later replacement AND the replacement "and" is not prepended or appended to its neighboring words.
Replace sequences of one or more non-alphanumeric characters with a literal hyphen.
Multibyte-safe Code: (Demo)
$title = "ÛŞÃ is going to dèport 80% öf indians&citizens are #concerned -- Breaking News / News India";
echo preg_replace(
'/[^\pL\pN]+/u',
'-',
str_replace(
'&',
' and ',
mb_strtolower($title)
)
);
Output:
ûşã-is-going-to-dèport-80-öf-indians-and-citizens-are-concerned-breaking-news-news-india
Note that the replacement in str_replace() could be done within the preg_replace() call by forming an array of find strings and an array of replacement strings. However, this may be false economy -- although there would be fewer function calls, the more expensive regex-based function call would make two passes over the entire string.
If you wish to convert accented characters to ASCII characters, then perhaps read the different techniques at Convert accented characters to their plain ascii equivalents.
If you aren't worries about multibyte characters, then the simpler version of the same approach would be:
echo preg_replace(
'/[^a-z\d]+/',
'-',
str_replace(
'&',
' and ',
strtolower($title)
)
);
To mop up any leading or trailing hyphens in the result string, it may be a good idea to unconditionally call trim($resultstring, '-'). Demo
For a deeper dive on the subject of creating a slug string, read PHP function to make slug (URL string).
I am currently using what appears to be a horribly complex and unnecessary solution to form a required string.
The string could have any punctuation and will include slashes.
As an example, this string:
Test Ripple, it\'s a comic book one!
Using my current method:
str_replace(" ", "-", trim(preg_replace('/[^a-z0-9]+/i', ' ', str_replace("'", "", stripslashes($string)))))
Returns the correct result:
Test-Ripple-its-a-comic-book-one
Here is a breakdown of what my current (poor) solution is doing in order to achieve the desired output:-
Strip all slashes from the string
remove any apostrophes with str_replace
remove any remaining punctuation using preg_replace and replace it with whitespace
Trim off any extra whitespace from the beginning/end of string which may have been caused by punctuation.
Replace all whitespace with '-'
But there must be a better and more efficient way. Can anyone help?
Personally it looks fine to me however I would make one small change.
Change
preg_replace("/[^a-z0-9]+/i"
to the following
preg_replace("/[^a-zA-Z0-9\s]/"
I want to replace all html codes to empty space. I think I should use preg_replace function, but I'm not sure how should I do that in case when html codes looks in this way:
”
β
$text="β something ” test..."
$text=preg_replace("&# [what should be here?] ;", " ", $text);
echo $text;
result = something test...
I think it should be only numeric, because I found only numeric ones here: http://www.ascii.cl/htmlcodes.htm
You could look at strip_tags which does exactly that. However those arent HTML codes, they are called HTML entities.
The regex to match what you want looks like this:
(&#.+?;)
Its rather simple, look for the &# then any repeated character until ;.
Edit: As Qtax pointed out, they dont have to be numbers. Dot matches all.
HTML character references can be defined in two ways. Assuming that you only want to replace numeric character references, you need a regular expression that parses these formats:
&#D; where D is a decimal number
&#xH; where H is a hexadecimal number
The regex that takes care of both:
/&#(\d+|x[\da-f]+);/i
If you want to replace all HTML entities like &foo; you could use something like:
preg_replace('/&(?:[a-z]+|#x[\da-f]+|#\d+);/i', ' ', $text);
If you want to decode them, use html_entity_decode.
&<something>; is a syntax for HTML entity. If you want to replace all of them, use this regexp:
preg_replace('/&.*?;/', '', $subject); // from ampersand till the next semicolon
It will replace all HTML entities with an empty string, including ä, &x20; and others