I wouldn't call myself a master regarding regex, i pretty much just know the basics. I've been playing around with it, but i can't seem to get the desired result. So if someone would help me, i would really appreciate it!
I'm trying to check wether unwanted words exist in a string. I'm working on a math project, and i'm gonna be using eval() to calculate the string, so i need to make sure it's safe.
The string may contain (just for example now, i'll add more functions later) the following words: (read the comments)
floor() // spaces or numbers are allowed between the () chars. If possible, i'd also like to allow other math functions inside, so it'd look like: floor( floor(8)*1 ).
It may contain any digit, any math sign (+ - * /) and dots/commas (,.) anywhere in the string
Just to be clear, here's another example: If a string like this is passed, i do not want it to pass:
9*9 + include('somefile') / floor(2) // Just a random example on something that's not allowed
Now that i think about it, it looks kind of complicated. I hope you can at least give me some hints.
Thanks in advance,
-Anthony
Edit: This is a bit off-topic, but if you know a better way of calculating math functions, please suggest it. I've been looking for a safe math class/function that calculates an input string, but i haven't found one yet.
Please do not use eval() for this.
My standard answer to this question whenever it crops up:
Don't use eval (especially if the formula contains user input) or reinvent the wheel by writing your own formula parser.
Take a look at the evalMath class on PHPClasses. It should do everything that you want in a nice safe sandbox.
To rephrase your problem, you want to allow only a specific set of characters, plus certain predefined words. The alternation operator (pipe symbol) is your friend in this case:
([0-9\+\-\*\/\.\,\(\) ]|floor|ceiling|other|functions)*
Of course, using eval is inherently dangerous, and it is difficult to guarantee that this regex will offer full protection in a language with syntax as expansive as PHP.
Related
I must detect the presence of some words (even polyrematic, like in "bag of words") in a user-submitted string.
I need to find the exact word, not part of it, so the strstr/strpos/stripos family is not an option for me.
My current approach (PHP/PCRE regex) is the following:
\b(first word|second word|many other words)\b
Is there any other better approach? Am I missing something important?
Words are about 1500.
Any help is appreciated
A regular expression the way you're demonstrating will work. It may be challenging to maintain if the list of words grows long or changes.
The method you're using will work in the event that you need to look for phrases with spaces and the list doesn't grow much.
If there are no spaces in the words you're looking for, you could split the input string on space characters (\s+, see https://www.php.net/manual/en/function.preg-split.php ), then check to see if any of those words are in a Set (https://www.php.net/manual/en/class.ds-set.php) made up of the words you're looking for. This will be a bit more code, but less regex maintenance, so ymmv based on your application.
If the set has spaces, consider instead using Trie. Wiktor Stribiżew suggests: https://github.com/sters/php-regexp-trie
I have two words nice and niece. How do i figure out e is missing to make both words the same.
Also say i have from and form. How do i return/figure out the letter r o has to be swap to make the words the same.
What i really want is a php that does whats in the image below.
I tried using built-in PHP functions for string manipulation, but none seem to be able to accomplish what i want.
Any help will be greatly appreciated.
what you're looking for is optimal alignment. simplest way to compute it is Needleman–Wunsch algorithm. for bigger inputs you will need something better but more complicated: Hirschberg's algorithm (which has better memory complexity). or simply get some diff library written in your language
however you won't get 'impossible' result as each word can be changed to other word using deletions, insertions and changes. if you need different constraints you would have to modify above algorithms and use your own metrics and/or operations
I am working on a regular expression that would grab the price in different format as I don't know in which format I am going to get the string so I am trying to cover as many variation as possible
Here is what I came up with
\$\s*?(\d+\.?\d*?)+|usd\s*?(\d+\.?\d*?)+|(\d+\.?\d*?)\s*?usd+|(\d+\.?\d*?)\s*?dollars?+|dollars?\s*?(\d+\.?\d*?)+|(\d+\.?\d*?)\s*?bucks?+|bucks?\s*?(\d+\.?\d*?)+
I've tried the above with several examples and it didn't fail so far.
anyone can think of a better way to achieve that ?
The real answer here is going to be achieved through normalization of the data. Start by removing every character except digits, the dot, and (if you expect negative values) the hyphen. Then you will have a character string that can be used as a number. When you have some test data available, try normalization first before you try to write regular expressions. Not only will the code be easier to write, but it will run faster, too!
I would advise using seperate expressions for each variation, and testing them in sequence (most likely ones first), applying the chain of responibility pattern.
The advantage is maintainability. When you need to support a new variation (considering you don't know all possible cariations beforehand) it'll simply be a matter of adding another member to the chain, rather than fiddling with the arcane complexities of what you have built now.
Is it possible to display the strings that match a regular expression?
Example:
Take the expression /^AD\d{3}/
and display AD999
What I'm doing is validating a string that is pretty simple either containing all numbers, a few characters maybe, and maybe a '-'. I am validating a postal code on form submit against a database of all countries that use a postal code.
I could perform it in Javascript or PHP, if that makes any difference.
No. That sort of feature is not available.
You can try to implement it yourself, but I don't think that's the solution for you. Simply write the messages normally. Not everything must always be dynamic.
I like your way of thinking though.
It is possible. The developers of PEX figured it out.
Don't get your hopes up, I don't know of any javascript implementation.
There is one for javascript now: http://fent.github.io/randexp.js/.
I have understood your problem a little better from your additional comments.
Since your data is only postal codes, I suggest that it would possible to work in the other direction and store a picture in the database and automatically generate a regex from that.
For instance, UK postcodes look like AA?99? 9AA | AA?9A 9AA which is easily converted to a regex (using a regex!).
I am building a string to detect whether filename makes sense or if they are completely random with PHP. I'm using regular expressions.
A valid filename = sample-image-25.jpg
A random filename = 46347sdga467234626.jpg
I want to check if the filename makes sense or not, if not, I want to alert the user to fix the filename before continuing.
Any help?
I'm not really sure that's possible because I'm not sure it's possible to define "random" in a way the computer will understand sufficiently well.
"umiarkowany" looks random, but it's a perfectly valid word I pulled off the Polish Wikipedia page for South Korea.
My advice is to think more deeply about why this design detail is important, and look for a more feasible solution to the underlying problem.
You need way to much work on that. You should make an huge array of most-used-word (like a dictionary) and check if most of the work inside the file (maybe separated by - or _) are there and it will have huge bugs.
Basically you will need of
explode()
implode()
array_search() or in_array()
Take the string and look for a piece glue like "_" or "-" with preg_match(); if there are some, explode the string into an array and compare that array with the dictionary array.
Or, since almost every words has alternate vowel and consonants you could make an huge script that checks whatever most of the words inside the file name are considered "not-random" generated. But the problem will be the same: why do you need of that? Check for a more flexible solution.
Notice:
Consider that even a simple-and-friendly-file.png could be the result of a string generator.
Good luck with that.