How do you write and test your regular expressions? - php

Is their any wizards or tools to create and test regular expressions for PHP, because it is so difficult :( ?
thanks :)

RegexBuddy is a widely popular app for this purpose. It also costs $40 and only runs on Windows.
For powerful free alternatives, see this answer.

reAnimator is a nice tool to visualize your regex as a state machine- I find it useful sometimes.
Python also allows you to view a regex parse tree, which can be helpful if you learn to read it.

Unit testing with example data. Create two arrays, one with matching data, and one with non-matching data if necessary to test edge cases.

Trial and error success.
Because I've spent the time to actually learn it, instead of relying on something else to do it for me.
Same applies to any language/tool - take a bit of time to learn the syntax and general ethos, and you'll be far more productive than relying on intellisense, code hinting, and so on.

There are powerful online tools. Offline,
The Regex Coach is a great free offline regex tool that I use fairly regularly.
I like RegEx Buddy also, but it costs $40 and I'm cheap.

Expresso is free Windows program and gives nice breakup and explanation of the regex under analysis.
For online tools that you can run right away from a browser, see this answer.

i always use this: http://gskinner.com/RegExr/

Trial and error.
And print_r.

I really like RegexPal, which is simple, clear, requires no installation and freely available online.

Online... there's an ajax regex checker with js/pcre/posix implementations, that checks as you type.. way cool.
http://www.rexv.org

Regex Buddy is overkill ($40) and works only on Windows. It was a good choice back in 2009 maybe.
Now we have free powerful online tools to build and test regular expressions. Regex101 is one of them:
lets you select the RE engine (PCRE, JavaScript, Python)
colorizes the matches
explains the regexp on the fly
has a debugger
can create permalinks to the regexp playground.
More regexp testing tools in my other answer.

I generally use Rubular when I'm working on testing a regular expression. You could also try txt2re.com, it can be handy for helping you figure out an expression and can even generate relevant PHP code.

I used to use The Regex Coach. But because it's Perl based and most of the time I'm testing .NET regular expressions, I now use this online .NET regular expression tester.

I liked the emacs re-builder.

I've written my own tool: Regular Expression Tester. Unlike many other web-based tools, this one can break a regex down into tokens and describe what each token is doing. It's great for examining new expressions, or expressions that you wrote a long time ago and don't quite remember.

Since you're talking about PHP, you may be interested in Codebench. It is a tool, not specifically to break down regexes (you've got a lot of those listed already), but to benchmark them. Since it is rather generic, you can also compare non-regex solutions as often native string functions are faster. Moreover, it allows you to benchmark against multiple subjects (targets) as well. Hope you find it useful.

I'm using unit-testing. That way, I can grow my regex incrementally, being certain that the first cases I tested still pass. And if ever I have to modify it, I have all my tests to back me up.

Here is another online regular expression tester for Java:
http://www.fileformat.info/tool/regex.htm

For online test Regx http://www.regexr.com/ use this site and if your regx work on this then you can check it for php on writecodeonline.com with preg_match() function.

I wrote a python library to accomplish this, it is under cloudtb.re
text = 'so foo is the opposite of bar but without foo there is no bar?'
exp = '(foo).*?(bar)'
searched = cre.research(exp, text)
print(searched)

Related

Convert JavaScript regular expression to PHP [duplicate]

I know this question has been asked about a dozen times, but this one is not technically a dupe (check the others if you like) ;)
Basically, I have a Javascript regex that checks email addresses which I use for front-end validation, and I use CodeIgniter to double check on the back end, in case the validation on the front end fails to run properly (browser issues, for instance.) It's QUITE a long regular expression, and I have no idea where to begin converting it by hand.
I'm pretty much looking for a tool that converts JS regexes to PHP regexes - I haven't found one in any of the answers to similar questions (of course, it's possible that such a tool doesn't exist.) Okay, I lied - one of them suggested a tool that costs $39.95, but I really don't want to spend that much to convert a single expression (and no, there isn't a free trial as suggested by the answer to the aforementioned question.)
Here's the Javascript expression, graciously provided by aSeptik:
/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i
And the one used by CodeIgniter, which I don't want to use because it doesn't follow the same rules (disallows some valid addresses):
/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*#([a-z0-9\-]+\.)+[a-z]{2,6}$/ix
I want to use the same rules set by the Javascript regex in PHP.
Having this sort of inconsistency where my front-end code is saying that the email address is okay, and then Codeigniter says it isn't, is of course the behavior I'm trying to fix in my application.
Thanks for any and all tips! :D
There are some differences between regex engines in Javascript and PHP. Please check Comparison of regular-expression engines
article for theoretical and Difference between PHP regex and JavaScript regex answer for practical information.
Most of the time, you can use Javascript regex patterns in PHP with small modifications. As a fundamental difference, PHP regex is defined as a string (or in a string) like this:
preg_match('/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/',$telephone);
Javascript regex is not, it's defined in its own way:
var ptr = new RegExp(/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/);
// or
var ptr = /^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/;
You can give it a try by running the regex on PHP. As a recommendation, do not replace it in Codeigniter files, you can simply extend or replace native library. You can check Creating Libraries out for more information.
I was able to solve this in a better-than-expected manner. I was unable to convert the Javascript regex that I wanted to use (even after purchasing RegexBuddy - it'll come in handy, but it was not able to produce a proper conversion), so I decided to go looking on the Regex Validate Email Address site to see if they had any recommendations anywhere for good regexes. That's when I found this:
"The expression with the best score is currently the one used by PHP's filter_var()":
/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}#)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*#(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD
It matches with only 4/86 errors, while the Javascript one I was using matches with 8/86 errors, so the PHP one is a little more accurate. So, I extended the CodeIgniter Form_validation library to instead use return filter_var($str, FILTER_VALIDATE_EMAIL);.
...But does it work in Javascript?
var pattern = new RegExp(/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}#)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*#(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/i);
Zing! Works like a charm! Not only did I get the consistency I was looking for between front and back end validation, but I also got a more accurate regex in the process. Double win!
Thank you to all those who provided suggestions!
Today there is exists the site https://regex101.com/ where you can transform one JS regex to PHP or some another languages.

What is a good strategy for getting the title of a page?

... server side and using PHP.
I read this SO article on when to use regexes and it basically states that you can use regexes to parse HTML in certain cases.
<title></title>
should be easy to match.
I see no problem with this. I think the popular answer is voted so much for not b.c. of correctness but b.c. of entrainment value.
Is this O.K?
Yes, it is
/<title[^>]*>(.*?)<\/title>/is
Different people have different opinions, though. And you should only use regex if you know what you're doing.
This might me a very interesting read: When you should NOT use Regular Expressions?
Your best bet is to use an HTML parsing library (like this one), not regex. You may get away with using regex in this case, but it's like using a hammer to pound in a screw.
If you are looking for anything non-trivial in the HTML, regex is going to be very confusing and hard to read, and in many cases, regex cannot do the job without making many assumptions about the content of the HTML.

Converting Javascript Regex to PHP

I know this question has been asked about a dozen times, but this one is not technically a dupe (check the others if you like) ;)
Basically, I have a Javascript regex that checks email addresses which I use for front-end validation, and I use CodeIgniter to double check on the back end, in case the validation on the front end fails to run properly (browser issues, for instance.) It's QUITE a long regular expression, and I have no idea where to begin converting it by hand.
I'm pretty much looking for a tool that converts JS regexes to PHP regexes - I haven't found one in any of the answers to similar questions (of course, it's possible that such a tool doesn't exist.) Okay, I lied - one of them suggested a tool that costs $39.95, but I really don't want to spend that much to convert a single expression (and no, there isn't a free trial as suggested by the answer to the aforementioned question.)
Here's the Javascript expression, graciously provided by aSeptik:
/^((([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+(\.([a-z]|\d|[!#\$%&'\*\+\-\/=\?\^_`{\|}~]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])+)*)|((\x22)((((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(([\x01-\x08\x0b\x0c\x0e-\x1f\x7f]|\x21|[\x23-\x5b]|[\x5d-\x7e]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(\\([\x01-\x09\x0b\x0c\x0d-\x7f]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF]))))*(((\x20|\x09)*(\x0d\x0a))?(\x20|\x09)+)?(\x22)))#((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?$/i
And the one used by CodeIgniter, which I don't want to use because it doesn't follow the same rules (disallows some valid addresses):
/^([a-z0-9\+_\-]+)(\.[a-z0-9\+_\-]+)*#([a-z0-9\-]+\.)+[a-z]{2,6}$/ix
I want to use the same rules set by the Javascript regex in PHP.
Having this sort of inconsistency where my front-end code is saying that the email address is okay, and then Codeigniter says it isn't, is of course the behavior I'm trying to fix in my application.
Thanks for any and all tips! :D
There are some differences between regex engines in Javascript and PHP. Please check Comparison of regular-expression engines
article for theoretical and Difference between PHP regex and JavaScript regex answer for practical information.
Most of the time, you can use Javascript regex patterns in PHP with small modifications. As a fundamental difference, PHP regex is defined as a string (or in a string) like this:
preg_match('/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/',$telephone);
Javascript regex is not, it's defined in its own way:
var ptr = new RegExp(/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/);
// or
var ptr = /^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/;
You can give it a try by running the regex on PHP. As a recommendation, do not replace it in Codeigniter files, you can simply extend or replace native library. You can check Creating Libraries out for more information.
I was able to solve this in a better-than-expected manner. I was unable to convert the Javascript regex that I wanted to use (even after purchasing RegexBuddy - it'll come in handy, but it was not able to produce a proper conversion), so I decided to go looking on the Regex Validate Email Address site to see if they had any recommendations anywhere for good regexes. That's when I found this:
"The expression with the best score is currently the one used by PHP's filter_var()":
/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}#)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*#(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD
It matches with only 4/86 errors, while the Javascript one I was using matches with 8/86 errors, so the PHP one is a little more accurate. So, I extended the CodeIgniter Form_validation library to instead use return filter_var($str, FILTER_VALIDATE_EMAIL);.
...But does it work in Javascript?
var pattern = new RegExp(/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}#)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*#(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/i);
Zing! Works like a charm! Not only did I get the consistency I was looking for between front and back end validation, but I also got a more accurate regex in the process. Double win!
Thank you to all those who provided suggestions!
Today there is exists the site https://regex101.com/ where you can transform one JS regex to PHP or some another languages.

identify tense in php

I'm looking for a way to analyze a string of text and find out in which tense it was written, for example : "I'm going to the store" == current, "I bought a car" == past ect..
Any tips on how I could this done?
Yes, this is going to be extremely difficult... I had started to do something similar for what was going to be a quick weekend project until I realized this... nonetheless here is a resource I found to be helpful.
Download the source code of Wordnet 3.0 from Princeton, which has a database of english words. The file /dict/index.verb is a list of present tense english verbs you should be able to import into your database as a CSV without too much trouble. From there, you're on your own, and will need to figure out how to handle the weirdness that is the English language.
This could be a rather tasking process. How detailed do you want to get? Do you want to consider only past, present, and future? Or do you want to consider Simple Present, Present Progressive, Simple Past, etc?
In any case, you'll also have to evaluate the Affirmative forms, Negative forms, and Question forms. A great chart online that can help can be found at http://www.ego4u.com/en/cram-up/grammar/tenses
Note the rules and signal words.
Tokenize / find action words from db/file (or at least, guess - *th=past, for example) / count tense hits?
For such a task, I believe Regular expressions won't be enough : it's a pretty difficult task...
Either you won't get anything good at all from regex, or you'll end with some kind of super-monster-regex that not even you will understand and be able to maintain...
This probably requires more than regex... Something like some kind of "linguistic-engine", I suppose...
If you actually need it and aren't just playing around, you might take a look at nltk. Parsing is a complex matter. Parsing natural languages is even more complex. And parsing a highly irregular language, such as English, is even worse. If you can narrow the problem scope down, you stand a better chance at a solution.
What do you need it for?
You can find a basic Brill Parser implementation for PHP at Ian Barber's PHP/ir site. The algorithm will tag your words.
If you enter the words "I think", the result will be:
I/NN think/VBP
NN= Noun,
VBP= Verb Present

Making a JavaScript regex equivalent to a PHP regex

After my web form is submitted, a regex will be applied to user input on the server side (via PHP). I'd like to have the identical regex running in real-time on the client side to show the user what the real input will be. This will be pretty much the same as the Preview section on the Ask Question pages on Stack Overflow except with PHP on the back-end instead of .NET.
What do I need to keep in mind in order to have my PHP and JavaScript regular expressions act exactly the same as each other?
Hehe this was sort of asked moments ago and Jeff pointed out:
http://www.regular-expressions.info/refflavors.html.
There is a comparison of regular expression capabilities across tools and languages.
If the regular expressions are simple then there should be no issue, as the basics of regular expressions are common across most implementations.
For particulars then it would be best to study both implementations:
http://www.regular-expressions.info/php.html
http://www.regular-expressions.info/javascript.html
Javascripts implementation is probably the more basic, so if you are going for a lowest common denominator approach then aim for that one.
I've found that different implementations of regular expressions often have subtle differences in what exactly they support. If you want to be entirely sure that the result will be the same in both frontend and backend, the savest choice would be to make an Ajax call to your PHP backend and use the same piece of PHP code for both regex evaluations.
#LKM AJAX is the clear winner here. This will also allow you to follow the DRY principle. Why would you want to write your parsing code in Javascript and PHP?
Both JavaScript's regex and PHP's preg_match are based on Perl, so there shouldn't be any porting problems. Do note, however, that Javascript only supports a subset of modifiers that Perl supports.
For more info for comparing the two:
Javascript Regular Expressions
PHP Regular Expressions
As for delivery method, I'd suggest you'd use JSON, the slimmest data interchange format as of date (AFAIK) and directly translatable to a JavaScript object through eval(). Just put that bad boy through an AJAX session and you should be set to go.
I hope this helps :)

Categories