JavaScript regex not working for PHP - php

I have a javascript regex
Value.match(/[A-Za-z0-9\-\,\.\(\)/]/)
This gives me 1 if a string contains alphabets, numbers, hyphen, comma, dot or braces; if any other character is found it gives 0.
When I apply same regex in PHP it is not working. Why?

You don't need to escape characters inside [] so you can try this /[A-Za-z0-9,.()]/ or even this one /[\w,.()]/ but if you want to check that the string contains only those characters that regex won't do, try:
/^[\w,.()]+$/
I noticed that you also have /. Is that intentional or a mistake, because you don't mention it in the question...

Related

php preg_match mismatch

I would like to know why preg_match('/(?<=\s)[^,]+(?=\s)/',$data,$matches);
matches "List Processes 8989" in the string "20180513 List Processes 8989". The regex I am using should not match numeric characters. What is wrong?
The [^,] basically means any character except ,. If you want to exclude numeric characters as well, you can replace it with [^,0-9], or better [^,\d], so your regex would look like this:
(?<=\s)[^,\d]+(?=\s)
Try it online.
I'm assuming the input string in your question is only part of the actual input string you're using because the regex you provided won't match the numbers at the end unless they're followed by a whitespace.
References:
Negated Character Classes.
Difference between [0-9] and \d.

RegEx: Look-behind to avoid odd number of consecutive backslashes

I have user input where some tags are allowed inside square brackets. I've already wrote the regex pattern to find and validate what's inside the brackets.
In user input field opening-bracket could ([) be escaped with backslash, also backslash could be escaped with another backslash (\). I need look-behind sub-pattern to avoid odd number of consecutive backslashes before opening-bracket.
At the moment I must deal with something like this:
(?<!\\)(?:\\\\)*\[(?<inside brackets>.*?)]
It works fine, but problem is that this code still matches possible pairs of consecutive backslashes in front of brackets (even they are hidden) and look-behind just checks out if there's another single backslash appended to pairs (or directly to opening-bracket). I need to avoid them all inside look-behind group if possible.
Example:
my [test] string is ok
my \[test] string is wrong
my \\[test] string is ok
my \\\[test] string is wrong
my \\\\[test] string is ok
my \\\\\[test] string is wrong
...
etc
I work with PHP PCRE
Last time I checked, PHP did not support variable-length lookbehinds. That is why you cannot use the trivial solution (?<![^\\](?:\\\\)*\\).
The simplest workaround would be to simply match the entire thing, not just the brackets part:
(?<!\\)((?:\\\\)*)\[(?<inside_brackets>.*?)]
The difference is that now, if you're using that regex in a preg_replace, you gotta remember to prefix the replacement string by $1, to restore the backslashes being there.
You could do it without any look-behinds at all (the (\\\\|[^\\]) alternation eats anything but a single back-slash):
^(\\\\|[^\\])*\[(?<brackets>.*?)\]

PHP Regex for checking space or certain characters after string

I need a regex which can basically check for space, line break etc after string.
So conditions are,
Allow special characters ., _, -, + inside the string i.e.#hello.world, #hello_world, #helloworld, etc.
Discard anything including special characters where there is no alpha-numeric string after them i.e. #helloworld.<space>, #helloworld-<space>, #helloworld.?, etc. must be parsed as #helloworld
My existing RegEx is /#([A-Za-z0-9+_.-]+)/ which works perfectly Condition #1, but still there seems to be a problem Condition #2
I am using above RegEx in preg_replace()
Solution:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
This works perfectly.
Tested with
http://gskinner.com/RegExr/
You can use word boundaries to easily find the position between an alphanumeric letter and a non-alphanumeric letter:
$str = preg_replace('##[\w+.\-]+\b#', '[[$0]]', $str);
Working example: http://ideone.com/0ShCm
Here's an idea:
Use strrev to reverse the string
Use strcspn to find the longest prefix of the reversed string that does not contain any alphanumeric characters
Cut the prefix off with substr
Reverse the string again; this is your final result
See it in action.
I 'm not taking into account any requirement that restricts the legal characters in the string to some subset, but you can use your regular expression for that (or even strspn, which might be faster).
The reason is because it's reading the string as a whole. If you want it to parse out everything after the alphanumeric section you might have to do like and end(explode()); and run that through to make sure that it isn't valid and if it isn't valid then remove it from the equation, but then you'd have to check the end for every possible explode point i.e. .,-,~,etc.
Then again another trap that you might run into is that in the case of a item or anything w/ alphanumeric value it might just parse everything from after the last alphanumeric character on.
Sorry that this isn't much help, but I figured thinking aloud does help.

Regex for netbios names

I got this issue figuring out how to build a regexp for verifying a netbios name. According to the ms standard these characters are illegal
\/:*?"<>|
So, thats what I'm trying to detect. My regex is looking like this
^[\\\/:\*\?"\<\>\|]$
But, that wont work.
Can anyone point me in the right direction? (not regexlib.com please...)
And if it matters, I'm using php with preg_match.
Thanks
Your regular expression has two problems:
you insist that the match should span the entire string. As Andrzej says, you are only matching strings of length 1.
you are quoting too many characters. In a character class (i.e. []), you only need to quote characters that are special within character classes, i.e. hyphen, square bracket, backslash.
The following call works for me:
preg_match('/[\\/:*?"<>|]/', "foo"); /* gives 0: does not include invalid characters */
preg_match('/[\\/:*?"<>|]/', "f<oo"); /* gives 1: does include invalid characters */
As it stands at the moment, your regex will match the start of the string (^), then exactly one of the characters in the square brackets (i.e. the illegal characters), then then end of the string ($).
So this likely isn't working because a string of length > 1 will trivially fail to match the regex, and thus be considered OK.
You likely don't need the start and end anchors (the ^ and $). If you remove these, then the regex should match one of the bracketed characters occurring anywhere on the input text, which is what you want.
(Depending on the exact regex dialect, you may canonically need less backslashes within the square brackets, but they are unlikely to do any harm in any case).

RegEx string "preg_replace"

I need to do a "find and replace" on about 45k lines of a CSV file and then put this into a database.
I figured I should be able to do this with PHP and preg_replace but can't seem to figure out the expression...
The lines consist of one field and are all in the following format:
"./1/024/9780310320241/SPSTANDARD.9780310320241.jpg" or "./t/fla/8204909_flat/SPSTANDARD.8204909_flat.jpg"
The first part will always be a period, the second part will always be one alphanumeric character, the third will always be three alphanumeric characters and the fourth should always be between 1 and 13 alphanumeric characters.
I came up with the following which seems to be right however I will openly profess to not knowing very much at all about regular expressions, it's a little new to me! I'm probably making a whole load of silly mistakes here...
$pattern = "/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z]{1,13}\/)$/";
$new = preg_replace($pattern, " ", $i);
Anyway any and all help appreciated!
Thanks,
Phil
The only mistake I encouter is the anchor for the string end $ that should be removed. And your expression is also missing the _ character:
/^(\.\/[0-9a-zA-Z]{1}\/[0-9a-zA-Z]{3}\/[0-9a-zA-Z_]{1,13}\/)/
A more general pattern would be to just exclude the /:
/^(\.\/[^\/]{1}\/[^\/]{3}\/[^\/]{1,13}\/)/
You should use PHP's builtin parser for extracting the values out of the csv before matching any patterns.
I'm not sure I understand what you're asking. Do you mean every line in the file looks like that, and you want to process all of them? If so, this regex would do the trick:
'#^.*/#'
That simply matches everything up to and including the last slash, which is what your regex would do if it weren't for that rogue '$' everyone's talking about. If there are other lines in other formats that you want to leave alone, this regex will probably suit your needs:
'#^\./\w/\w{3}/\w{1,13}/#"
Notice how I changed the regex delimiter from '/' to '#' so I don't have to escape the slashes inside. You can use almost any punctuation character for the delimiters (but of course they both have to be the same).
The $ means the end of the string. So your pattern would match ./1/024/9780310320241/ and ./t/fla/8204909_flat/ if they were alone on their line. Remove the $ and it will match the first four parts of your string, replacing them with a space.
$pattern = "/(\.\/[0-9a-z]{1}\/[0-9a-z]{3}\/[0-9a-z\_]+\.(jpg|bmp|jpeg|png))\n/is";
I just saw, that your example string doesn't end with /, so may be you should remove it from your pattern at the end. Also underscore is used in the filename and should be in the character class.

Categories