Porting regex from PHP to JavaScript - php

I have the following regex in PHP:
/(?<=\')[^\'\s][^\']*+(?=\')|(?<=")[^"\s][^"]*+(?=")|[^\'",\s]+/
and I would like to port it to javascript like:
var regex = new RegExp('/(?<=\')[^\'\s][^\']*+(?=\')|(?<=")[^"\s][^"]*+(?=")|[^\'",\s]+/');
var match = regex.exec("hello,my,name,is,'mr jim'")
for( var z in match) alert(match[z]);
There is something that JavaScript doesnt like here, but I have no idea what it is. I've tried looking for diferences between PHP and JS regex via regular-expressions.info but I cant see anything obvious.
Any help would be greatly appreciated
Thank you again
Edit:
The problem seems to lie within the positive lookbehind's but does this mean it cannot be ported?

Correct - the positive lookbehinds will not work.
But, just as some general information about regex in Javascript, here's a couple pointers for you.
You don't have to use the RegExp object - you can use pattern literals instead
var regex = /^[a-z\d]+$/i;
But if you use the RegExp object, you have to escape your backslashes since your pattern is now locked in a string.
var regex = new RegExp( '^[a-z\\d]+$', 'i' );
The primary benefit of the RegExp object is if there is a dynamic bit to your pattern, for example
var max = 4;
var regex = new RegExp( '\d{1,' + max + '}' );

You don't get lookbehind (and lookahead has problems in IE, so is best avoided too). But it's easy to just let those ' and " characters be part of the match, and throw them out afterwards:
var value= "hello,my,name,is,'mr jim'";
var match;
var r= /'[^'\s][^']*'|"[^"\s][^"]*"|[^'",\s]+/g;
while(match= r.exec(value)) {
var text= match[0];
if ('"\''.indexOf(text.charAt(0))!=-1) // starts with ' or "?
text= text.substring(1, text.length-1);
alert(text);
}
Or, use capturing parentheses to isolate the quotes from the text:
var r= /'([^'\s][^']*)'|"([^"\s][^"]*)"|([^'",\s]+)/g;
while (match= r.exec(value)) {
var text= match[1] || match[2] || match[3];
alert(text);
}
(I'm guessing your for(var z in match) was supposed to loop over each pattern match in the string. Unfortunately JavaScript doesn't quite work that easily.)
This may not be the best way to parse a comma-separated list; it seems a bit ill-defined for cases where you have a space or quote in the middle of a field. A simple string-indexing parser might be a better bet.

it's (?<=) positive look-behind what Javascript doesn't support. but be aware that Javascript implementation in different browsers vary significantly.
Edit: there is an SO question devoted to workaround.

Related

How do I convert this PHP regex to Javascript?

I tried converting this:
$regex = "/^[0-9]+[0-9\.]*(?<!\.)$/"
to all of these, but none are correct:
var regex = /^(?!\.$)[0-9]+[0-9\.]*/;
var regex = /^(?!.*\.$)[0-9]+[0-9\.]*/;
var regex = /^[0-9]+[0-9\.]*(?!\.$)/;
The PHP regex correctly rejects 1.1a and 1., but the javascript regex's do not.
Your PHP Regex may be better written as the following, which matches the same language, but is easier to read and doesn't need to use a negative look-behind:
$regex = "/^\d+(\.\d+)*$/"
It is also easy to translate it directly to a Javascript regex:
var regex = /^\d+(\.\d+)*$/;

Regular expressions that removes only first "/"

I'm new to Regular expressions and can't seem to find out how I have to solve this:
I need a regular expressions that "allows" only numbers, letters and /. I wrote this:
/[^a-zA-Z0-9/]/g
I think it's possible to strip the first / off, but don't know how.
so #/register/step1 becomes register/step1
Who knows how I could get this result?
Thanks!
You can use a non-global match, if the pattern is contiguous in the string:
var rx=/(([a-zA-Z0-9]+\/*)+)/;
var s='#/register/step1';
var s1=(s.match(rx) || [])[0];
alert(s1)>>> returned value: (String) "register/step1"
"/register/step1".match(/[a-zA-Z0-9][a-zA-Z0-9/]*/); // ["register/step1"]
\w is Equivalent to [^A-Za-z0-9_], so:
"/register/step1".match(/\w[\w/]*/); // ["register/step1"]
edit: Don't know why i didn't suggest this first, but if you're simply enforcing the pattern rather than replacing, you could just replace that slash (if it exists) before checking the pattern, using strpos(), substr(), or something similar. If you are using a preg_replace() already, then you should look at the examples on the function docs, they are quite relevant

how to do the regexp in javascript for this string?

I have the following possible string:
'', or '4.', or '*.4' or '4.35'
all the above format are valid, others are all invalid.
basically, if I don't care the digit or word character, this is what I used in PHP for the validation:
else if ( !ereg('^\*|.*\..*$',$bl_objver) )
Now, I would like to add some clientside validation, so I just translate it into javascript:
var ver_reg = new RegExp("^\*|.*\..*$");
if (ver_reg.test(obj_ver) == false)
but firebug always shows some error, like: "invalid quantifier |...*$" etc..
any suggestions?
(I'm not convinced your expression is correct, but for the moment just going with what you have.)
Using the RegExp object, you need to escape the slashes:
var ver_reg = new RegExp("^\\*|.*\\..*$");
Alternatively you can use regex literal notation:
var ver_reg = /^\*|.*\..*$/;
That answers your direct question, but...
As for the expression, well, what you definitely want to correct is the start/end anchors each applying to one side of the alternation.
i.e. you're saying <this>|<that> where <this> is ^\* and <that> is .*\..*$
What you want is ^(?:<this>|<that>)$ to ensure the start/end markers are not part of the alternatives (but using ?: since we're not capturing the group).
So /^(?:\*|.*\..*)$/ using the second example above - this fix would also need applying to the PHP version (which can use the same syntax).
I'd also question your use of . instead of \w or [^.] or similar, but without knowing what you're actually doing, I can't say for sure what makes most sense.
Hope this helps! :)

Can I use javascript regular expression in php as it is

I am using a regular expression in javascript and want to do server side validation as well with the same regular expression. Do i need to modify it to make it compatible or will it run as it is.
How to use PHP regular expresion. Please provide a small example.
Thanks in Advance
EDIT
For Email Validation
var pattern = new RegExp(/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)/i);
For Phone no validation
var pattern = new RegExp(/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/);
PHP regexp are based on PCRE (Perl Compatible Regular Expression).
Example (find digits) :
preg_match('/^[0-9]*$/', 'my01string');
See php documentation.
Javascript regexp are slightly different (ECMA).
var patt1 = new RegExp("e");
document.write(patt1.test("The best things in life are free"));
See here for a comparison table.
Supposed to work for most of the patterns, except escaping special characters and backslashes, but not reverse, php regex have features than javascript like look behind expressions.
javascript : /[a-z]+/
php : '/[a-z]+/'
For example,
var pattern = new RegExp(/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/);
would be '/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/' in php
You should use one of the following functions preg_match or preg_match_all. And with a bit of luck you shouldn't need to modify your regex.
PHP regex uses the classic Perl regex, so a match would look like
preg_match_all('/([a-zA-Z0-9]+)/', $myStringToBeTested, $results);
Later edit:
$string="test#mail.com";
if(preg_match('/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)/', $string))
echo "matches!";
else
echo "doesn't match!";
Enjoy!
In PHP we use the function preg_match.
$email_pattern = '/^(("[\w-\s]+")|([\w-]+(?:\.[\w-]+)*)|("[\w-\s]+")([\w-]+(?:\.[\w-]+)*))(#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$)|(#\[?((25[0-5]\.|2[0-4][0-9]\.|1[0-9]{2}\.|[0-9]{1,2}\.))((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\.){2}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[0-9]{1,2})\]?$)/i';
$phoneno_pattern = '^/\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/';
if(preg_match($email_pattern,$input_email)) {
// valid email.
}
if(preg_match($phoneno_pattern,$input_ph)) {
// valid ph num.
}
You could have used the regex directly as the function argument instead of using a variable.

RegEx Backreferences

Having the following regular expression:
([a-z])([0-9])\1
It matches a5a, is there any way for it to also match a5b, a5c, a5d and so on?
EDIT: Okay, I understand that I could just use ([a-z])([0-9])([a-z]) but I've a very long and complicated regular expression (matching sub-sub-sub-...-domains or matching an IPv4 address) that would really benefit from the behavior described above. Is that somehow possible to achieve with backreferences or anything else?
Anon. answer is what I need, but it seems to be erroneous.
The answer is not with backreferences
Backreference means match the value that was previously matched. It does not mean match the previous expression. But if your language allows it you can substitute a variable in a string into your expression before compiling it.
Tcl:
set exp1 "([a-z])"
regexp "${exp1}([0-9])${exp1}+" $string
Javascript:
var exp1 = '([a-z])';
var regexp = new RegExp(exp1 + '([0-9])' + exp1 + '+');
string.match(regexp);
Perl:
my $exp1 = '([a-z])';
$string =~ /${exp1}([0-9])${exp1}+/;
You don't need back references if the second letter is independent of the first, right?
([a-z])([0-9])([a-z])+
EDIT
If you just don't want to repeat the last part over and over again, then:
([a-z])([0-9])([a-z])
Just taking away the '+'.
The whole point of a back-reference in a regular expression is to match the same thing as the indicated sub-expression, so there's no way to disable that behavior.
To get the behavior you want, of being able to reuse a part of a regular expression later, you could just define the parts of the regular expression you wish to reuse in a separate string, and (depending on the language you're working in) use string interpolation or concatenation to build the regular expression from the pieces.
For instance, in Ruby:
>> letter = '([a-z])'
=> "([a-z])"
>> /#{letter}([0-9])#{letter}+/ =~ "a5b"
=> 0
>> /#{letter}([0-9])#{letter}+/ =~ "a51"
=> nil
Or in JavaScript:
var letter = '([a-z])';
var re = new RegExp(letter + '([0-9])' + letter + '+');
"a5b".match(re)
I suspect you're wanting something similar to the Perl (?PARNO) construct (it's not just for recursion ;).
/([a-z])([0-9])(?1)+/
will match what you want - and any changes to the first capture group will be reflected in what the (?1) matches.
I don't follow your question?
[a-z][0-9][a-z] Exactly 1
[a-z][0-9][a-z]? One or 0
[a-z][0-9][a-z]+ 1 or more
[a-z][0-9][a-z]* 0 or more
Backreferences are for retrieving data from earlier in the regex and using it later on. They aren't for fixing stylistic issues. A regex with backreferences will not function as one without. You might just need to get used to regexes being repetitive and ugly.
Maybe try Python, which makes it easy to build regexes up from smaller blocks. Not clear if you're allowed to change your environment… you're lucky to have backreferences in the first place.

Categories