Differences in backslashing between Notepad++ and PHP - php

EDIT: I found a solution I didn't expect. See below.
Using regex via PHP's preg_match_all , I want to match a certain url (EDIT: that is already escaped) in a string formatted as json. The search works wonderfully in Notepad++ (using regex-matching, of course) but preg_match_all() just returns an empty array.
Testing on tryphpregex.com I found out that somehow my usual approach to escaping a backslash gives a pattern error, i.e. even the simple pattern https:\\ returns an empty result.
I'm utterly confused and have been trying to debug for too long so I may miss the obvious. Maybe one of you can see the simple error?
The string.
The pattern (that works fine in Notepad++, but not in PHP):
%(https:\\/\\/play.spotify.com\\/track\\/)(.*?)(\")%

You don't need to escape the slash in PHP %(https://play.spotify.com/track/)(.*?)(\")%
The Backslash before doule quote is only needed if you enclosures are double quotes too.

Found a solution to my problem.
According to this site, I need to match every backslash with \\\\. Horrible, but true.
So my pattern becomes:
$pattern = "%(https:\\\\/\\\\/play\.spotify\.com\\\\/track\\\\/)(.*?)(\")%";
Please observe that I tried to find a pattern inside a string that didn't contain clear urls, but urls containing escape characters (it was a json-output from spotify)

Related

preg_match url reconise for ifstatement [duplicate]

I've made some regex to test for a YouTube embedded video:
/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/
It works for what I expect when I test it, but the problem though is that I need to pass that regex as a string to some function. Particularly I'm using htmlawed, where I pass a following string to a function:
func('iframe=-*,src(match="/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/")');
The problem is that the above regex sort of works, but it just ignores the slashes, and accepts anything in place of them.
That is why I suspect that there is a problem with escaping.
I would appreciate if you could advice some alternative ways of escaping these slashes and backslashes... there must be some way?
If you have a string, you will need to escape the backslashes (and quotes) for the string literal. Or, depending on how the function builds the regex from the string, you might not need to escape slashes at all (I don't think so here).
"iframe=-*,src(match=\"/^(http:\\/\\/www\\.youtube\\.com\\/embed\\/)[^\\/\\s\\\\]+$/\")"
In PHP, you can also use a different regex delimiter:
~^(http://www\.youtube\.com/embed/)[^/\s\\\\]+$~

How to escape slashes in regex?

I've made some regex to test for a YouTube embedded video:
/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/
It works for what I expect when I test it, but the problem though is that I need to pass that regex as a string to some function. Particularly I'm using htmlawed, where I pass a following string to a function:
func('iframe=-*,src(match="/^(http:\/\/www\.youtube\.com\/embed\/)[^\/\s\\]+$/")');
The problem is that the above regex sort of works, but it just ignores the slashes, and accepts anything in place of them.
That is why I suspect that there is a problem with escaping.
I would appreciate if you could advice some alternative ways of escaping these slashes and backslashes... there must be some way?
If you have a string, you will need to escape the backslashes (and quotes) for the string literal. Or, depending on how the function builds the regex from the string, you might not need to escape slashes at all (I don't think so here).
"iframe=-*,src(match=\"/^(http:\\/\\/www\\.youtube\\.com\\/embed\\/)[^\\/\\s\\\\]+$/\")"
In PHP, you can also use a different regex delimiter:
~^(http://www\.youtube\.com/embed/)[^/\s\\\\]+$~

Fetch All URLs from a Page using Regex

Original format:
<a href="http://www.example.com/t434234.html" ...>
1. I need to fetch all URLs of this format:
http://www.example.com/t[ANY CHARACTER].html
ANY CHARACTER is where value changes from URL to another. The rest are fixed.
Here is my attempt:
preg_match("#http:\/\/www\.aqarcity\.com\/t[a-zA-Z0-9_]\.html#", $page, $urls);
I get empty results. I don't know where i went wrong...
The problem appears to be that [a-zA-Z0-9_] will only match exactly one character. If you want to match zero or more characters, use [a-zA-Z0-9_]*. For one or more, use [a-zA-Z0-9_]+. For exactly six characters, use [a-zA-Z0-9_]{6}. For e.g. one to six characters, use [a-zA-Z0-9_]{1,6}.
Also note that, since you're using # as the delimiter, you don't need to escape the / characters. As far as I know this will not make your code misbehave, but it'll be easier to read if you remove the backslashes before the slashes.
Finally, please realize that regular expressions are a rather dangerous way to work with HTML. In this case, you may pick up matching URLs from comments, Javascript code, and other things that aren't links. It is literally impossible to correctly parse HTML with unaugmented regular expressions—they don't have the expressive power necessary to do so. I don't know what sorts of HTML parsers are available for PHP, but you may want to look into them.

Regex for a Function Call with Multiple Optional Parameters

I'm looking for a regex that will scan a document to match a function call, and return the value of the first parameter (a string literal) only.
The function call could look like any of the following:
MyFunction("MyStringArg");
MyFunction("MyStringArg", true);
MyFunction("MyStringArg", true, true);
I'm currently using:
$pattern = '/Use\s*\(\s*"(.*?)\"\s*\)\s*;/';
This pattern will only match the first form, however.
Thanks in advance for your help!
Update
I was able to solve my problem with:
$pattern = '/Use\s*\(\s*"(.*?)\"/';
Thanks Justin!
~Scott
If you only care about the value of the first parameter, you can just chop off the end of the regex:
$pattern = '/Use\s*\(\s*"(.*?)\"/';
However, you should understand that this (or any pure-regex solution for this problem) will not be perfect, and there will be some possible cases it handles incorrectly. In this case, you'll get false positives, and escaped quotes (\") will break it.
You can ignore escaped quotes by complicating it a bit:
$pattern = '/Use\s*\(\s*"(.*?)(?!<(?:\\\\)*\\)\"/';
This ignores " characters inside the quoted string if they have an odd number of backslashes in front of them.
However, the false-postives issue can't be helped without introducing false-negatives, and vice versa. This is because PHP is an irregular language, so it can't be parsed with "pure" regex, and even modern regex engines that allow recursion are going to need some pretty complex code to do a really thorough job at this.
All I'm saying is, if you're planning a one-off job to quickly scrape through some PHP you wrote yourself, regex is probably fine. If you're looking for something robust and open-ended that will do this on arbitrary PHP code, you need some kind of reflection or PHP parser.
This might be slightly simpler, though will only work if you have double quotes and not single quotes:
$pattern = /Use\s*[^\"]*\"([^\"]*)\"/

replicate preg replace with javascript

Is it possible to replicate this with javascript?
preg_replace('/(.gif|.jpg|.png)/', '_thumb$1', $f['logo']);
EDIT - I am not getting this following error for this peice of code,
unterminated string literal
$('#feed').prepend('<div class="feed-item"><img src="'+html.logo.replace(/(.gif|.jpg|.png)/g, "_thumb$1")+'"/>
<div class="content">'+html.content+'</div></div>').fadeIn('slow');
There are a couple of problems with the code you are trying to replicate:
It matches "extensions" even if they aren't at the end of the filename.
The dot in a regular expression matches (nearly*) any character, not just a period.
Try this instead:
'abc.jpg'.replace(/\.(jpg|gif|png)$/, '_thumbs$&')
I'm assuming that the string you are trying to replace contains only a single filename.
*See the documentation for PCRE_DOTALL.
Yes, except that in JavaScript, replace is a string's method, so it would be rearranged a little (also, the array/object notation is slightly different):
f.logo.replace(/\.(gif|jpg|png)/, '_thumb.$1');
more info
somestringvar.replace(/(.gif|.jpg|.png)/, replacementValue)

Categories