I have several strings that look like this:
Lasklé
Jones & Jon
I am trying to send them via the foursquare API to be matched, however it is failing with these characters. Is there a way to sanitise these so they only include English letters i.e. the results would be:
Lasklé
Jones Jon
As it appears using file_get_contents requests both with the 'é' and the '&' in the URL is causing issues.
I checked how the request was sent and realised that the '&' is uneeded and is causing the issues, is it possible to remove all non Letters/Numbers from the name?
What do the strings look like before you pass them? If your string looks like 'Lasklé' then I think you are using the wrong character set when reading the string, try using UTF-8.
If the string looks correct before you pass it on you should try urlencode the string first.
you can use preg_replace() function to replace the part of string using regex
to keep only letters you can use as follow it will also remove space( add \s from expression to keep space)
preg_replace('/[^a-zA-Z]/','',$string);
to keep space in the string or any character to keep you can add it in []
preg_replace('/[^a-zA-Z\s]/','',$string);
Use this to escape (space and '-'). Good for making a custom URL
$string=preg_replace("/[^A-Za-z0-9\s\/\-]/", '', $string);
Related
I've made some regex to test for a YouTube embedded video:
/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/
It works for what I expect when I test it, but the problem though is that I need to pass that regex as a string to some function. Particularly I'm using htmlawed, where I pass a following string to a function:
func('iframe=-*,src(match="/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/")');
The problem is that the above regex sort of works, but it just ignores the slashes, and accepts anything in place of them.
That is why I suspect that there is a problem with escaping.
I would appreciate if you could advice some alternative ways of escaping these slashes and backslashes... there must be some way?
If you have a string, you will need to escape the backslashes (and quotes) for the string literal. Or, depending on how the function builds the regex from the string, you might not need to escape slashes at all (I don't think so here).
"iframe=-*,src(match=\"/^(http:\\/\\/www\\.youtube\\.com\\/embed\\/)[^\\/\\s\\\\]+$/\")"
In PHP, you can also use a different regex delimiter:
~^(http://www\.youtube\.com/embed/)[^/\s\\\\]+$~
I'm using php to look at an XML file that has a URL in it. The URLs look something like this:
https://site.com/bacon_report?Id=1&report=1¤tDimension=2¶m=1
When I echo out the URLs, the "¤" shows up as "¤" (AKA #164, A4 or currency symbol) and the links don't work. This happens even though there isn't a closing semicolon for it. What is the cleanest way to make "¤" display literally?
Funny enough I ran into the same problem just now and I found this answer. However, I found another solution which might even be better!
Simply put the variable at the beginning of your query string, and you will avoid the ¤ completely.
Do:
https://site.com/bacon_report?currentDimension=2&Id=1&report=1¶m=1
instead of:
https://site.com/bacon_report?Id=1&report=1¤tDimension=2¶m=1
Use the php function urlencode:
urlencode("https://site.com/bacon_report?Id=1&report=1¤tDimension=2¶m=1"
will output
https%3A%2F%2Fsite.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
The problem here is escaping - you need to escape the "&" characters. In XML all special characters like <, >, ', " and & should be escaped.
Escape it properly as
https://example.com/bacon_report?Id=1&report=1¤tDimension=2¶m=1
..just like in HTML:
WRONG - no escaping
CORRECT - correct escape sequence
So - the cleanest way to show "¤" in HTML/XML is to properly escape the ampersand, and render it as "¤".
I think that in this case it is best to use htmlentities because with urlencode you get
https%3A%2F%2Fexample.com%2Fbacon_report%3FId%3D1%26report%3D1%26currentDimension%3D2%26param%3D1
and when applying urldecode, you will still have the ¤ symbol
where as with htmlentities the url comes out clean.
https://example.com/bacon_report?Id=1&report=1¤tDimension=2¶m=1
I came across this issue while working on technical documentation (in Markdown which gets converted to HTML).
To solve the issue I used a zero-width space character which I copied and pasted from between these brackets (). That way it appears that there is no space and can include the below without any issues:
/search?query=1¤tLonLat=-74.600291,40.360869
I am trying to parse some json data using json_decode function of php. However, I need to remove certain leading and trailing characters from this long string before decoding. Therefore, I am using preg_match to remove those characters prior to decode. For some reason, preg_match is changing escaping when it encounters following substring (in the middle of the string)
{content: \\\"\\200B\\\"}
After preg_match the above string looks like this:
{content: \\"\200B\\"}
Because of this, json_decode fails.
FYI, the preg_match pattern looks like this:
(?<=remove_these_leading_char)(.*)(?=remove_these_trailing_char)
OK, so here is the additional information based on the questions being asked:
Why triple escaping? fix triple escpaing etc. The answer is that I don't have any control over it. It is not generated by my code.
The original string is not fully json compliant. It has several leading and trailing characters that need to be removed. Therefore I have to use regex. The format of that string is like this:
returnedHTMLdata({json_object},xx);
It looks like this behavior is not limited to preg_match only. Even substr also does this.
It looks like you've got some JSON with padding. To remove the function name and parenthesis, leaving the (unescaped) json object, you can do something like this:
$str = <<<'EOS'
returnedHTMLdata({content: \\\"\\200B\\\", foo: \\\"bar\\\", \"baz\": \\\"fez\\\"},xx);
EOS;
$str = preg_replace('/.+?({.+}).+/','$1', $str);
echo $str;
Output:
{content: \\\"\\200B\\\", foo: \\\"bar\\\", \"baz\": \\\"fez\\\"}
Please note that even if you manage to successfully unescape this string, json_decode requires that keys - e.g. "content" - are enclosed in double quotes, so you will need to modify the JSON string/object before calling that function. Or I guess you could instead use something like the old Services_JSON package to decode it, which I believe does not have that requirement.
EDIT: I found a solution I didn't expect. See below.
Using regex via PHP's preg_match_all , I want to match a certain url (EDIT: that is already escaped) in a string formatted as json. The search works wonderfully in Notepad++ (using regex-matching, of course) but preg_match_all() just returns an empty array.
Testing on tryphpregex.com I found out that somehow my usual approach to escaping a backslash gives a pattern error, i.e. even the simple pattern https:\\ returns an empty result.
I'm utterly confused and have been trying to debug for too long so I may miss the obvious. Maybe one of you can see the simple error?
The string.
The pattern (that works fine in Notepad++, but not in PHP):
%(https:\\/\\/play.spotify.com\\/track\\/)(.*?)(\")%
You don't need to escape the slash in PHP %(https://play.spotify.com/track/)(.*?)(\")%
The Backslash before doule quote is only needed if you enclosures are double quotes too.
Found a solution to my problem.
According to this site, I need to match every backslash with \\\\. Horrible, but true.
So my pattern becomes:
$pattern = "%(https:\\\\/\\\\/play\.spotify\.com\\\\/track\\\\/)(.*?)(\")%";
Please observe that I tried to find a pattern inside a string that didn't contain clear urls, but urls containing escape characters (it was a json-output from spotify)
I'm having a lot of difficulty matching an image url with spaces.
I need to make this
http://site.com/site.com/files/images/img 2 (5).jpg
into a div like this:
.replace(/(http:\/\/([^\s]+\.(jpg|png|gif)))/ig, "<div style=\"background: url($1)\"></div>")
Here's the thread about that:
regex matching image url with spaces
Now I've decided to first make the spaces into entities so that the above regex will work.
But I'm really having a lot of difficulty doing so.
Something like this:
.replace(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/ig, "http://$1/$2%20$3$4")
Replaces one space, but all the rest are still spaces.
I need to write a regex that says, make all spaces between http:// and an image extension (png|jpg|gif) into %20.
At this point, frankly not sure if it's even possible. Any help is appreciated, thanks.
Trying Paolo's escape:
.escape(/http:\/\/(.*)\/([^\<\>?:;]*?) ([^\<\>?:;]*)(\.(jpe?g|png|gif))/)
Another way I can do this is to escape serverside in PHP, and in PHP I can directly mess with the file name without having to match it in regex.
But as far as I know something like htmlentities do not apply to spaces. Any hints in this direction would be great as well.
Try the escape function:
>>> escape("test you");
test%20you
If you want to control the replacement character but don't want to use a regular expression, a simple...
$destName = str_replace(' ', '-', $sourceName);
..would probably be the more efficient solution.
Lets say you have the string variable urlWithSpaces which is set to a URL which contains spaces.
Simply go:
urlWithoutSpaces = escape(urlWithSpaces);
What about urlencode() - that may do what you want.
On the JS side you should be using encodeURI(), and escape() only as a fallback. The reason to use encodeURI() is that it uses UTF-8 for encoding, while escape() uses ISO Latin. Same problems applies for decoding.
encodeURI = encodeURI || escape;
alert(encodeURI('image name.png'));