Get String That Matches Wildcard in PHP - php

I need a very specific function in PHP. Basically, I have two strings as arguments, one of which is a pattern that contains wildcards of variable length (*), and one of which is a string that matches that pattern. I need to get an array of the strings from the latter string that fill in the wildcards in the pattern.
For example:
Argument 1: "This is * string that I *"
Argument 2: "This is my awesome string that I created myself"
Return: array("my awesome","created myself")
What's the cleanest way to do this? Bear in mind that these are not always strings of english words as in the example, they could be any random characters.

You could just replace the wildcards with a regex equivalent, and run it through preg_match.
Because it really does sound like homework, I won't give any specific code either, but I'd make good use of preg_replace (to replace the wildcards with regex equivalents) and preg_match (to build the array of matching substrings).

Sounds like homework. Here is a walk-through without any actual code:
Tokenize the pattern into strings and wildcards. Iterate over the tokens; each time the target string starts with a regular (string, non-wildcard) token, trim that string off. Each time you encounter a wild card token, find the index of the next token in the string and trim off up to that index. Store that. If at any time there is no match, return false. If you encounter the end of the string before the pattern is complete, return false. If the final token is a wildcard, save the remainder of the string.

Related

How to remove repeated chars from string but with exceptions like 'good' or 'cool'?

I'm trying to remove repeated chars from strings like
I looooovvee this. It's awesomee. Very gooood.
to an output like:
I love this. It's awesome, Very good.
I'm already using in PHP this instrunction
$str=preg_replace("/(.)\1+/", "$1", $str);
But it outputs
I love this. It's awesome. Very god.
The problem is in words that already should have repeated chars like 'good' or 'cool'
I suppose you could store your allowed words (like "good" and "cool") in a Trie Dictionary.
Whenever you are check a word for duplicate chars, you should allow duplicate characters upto the point where the Dictionary still have some valid words with that prefix.
When the dictionary has no valid words for the prefix, you can remove the duplicate chars from that point on
Eg: if the word you are checking is "Goooood"
check "Go" in trie it will return "God" and "Good" as valid words
check "Goo" in trie, it will return "Good" as the valid word
check "Gooo" in trie, it will say there are no valid words
Therefore you keep upto "Goo" and remove the rest of the o's
Since you started with regex you could continue with regex.
Use negative look-behind & negative look-ahead to check if before or after the repeating letter there's anything that you wouldn't want to match.
$str = preg_replace('/(?<!g|c)(\w)\1+(?!d|l)/', '$1', $str);
Unfortunately this implies writing a list of possible prefixes and suffixes.

Get specific string content inside big string

I have a big string like this:
[/az_column_text][/vc_column_inner][vc_column_inner width="3/4"]
[az_latest_posts post_layout="listed-layout" post_columns_count="2clm" post_categories="assemblea-soci-2015"]
[/vc_column_inner][/vc_row_inner][/vc_column]
What I need to extract:
assemblea-soci-2015
Of course this value can change, and also the big string can change too. I need a regex or something else to extract this value (it will be always from post_categories="my-value-to-extract") from this big string.
I think to take post_categories=" as the beginning of a possible substring and the next char " as the end of my portion, but no idea how to do this.
Is there an elegant way to do this also for future values with, of course, different length?
You can use this regex in PHP:
post_categories="\K[^"]+
RegEx Demo
You can use this regex:
(?<=post_categories=")[^"]+(?=")
?<= (lookbehind) looks for post_categories=" before the desired match, and (?=) (lookahead) looks for " after the desired match.
[^"] gets the match (which is assumed not to contain any ")
Demo
Example PHP code:
$text='[/az_column_text][/vc_column_inner][vc_column_inner width="3/4"]
[az_latest_posts post_layout="listed-layout" post_columns_count="2clm" post_categories="assemblea-soci-2015"]
[/vc_column_inner][/vc_row_inner][/vc_column]';
preg_match ("/(?<=post_categories=\")[^\"]+(?=\")/", $text,$matches);
echo $matches[0];
Output:
assemblea-soci-2015
This should extract what you want.
preg_match ("/post_categories=\"(.*)\"\[\]/", $text_you_want_to_use)

Simple Regex NOT on multidimensional JSON string

So i will provide this simple example of json string covering most of my actual string cases:
"time":1430702635,\"id\":\"45.33\",\"state\":2,"stamp":14.30702635,
And i'm trying to do a preg replace to the numbers from the string, to enclose them in quotes, except the numbers which index is already quoated, like in my string - '\state\':2
My regex so far is
preg_replace('/(?!(\\\"))(\:)([0-9\.]+)(\,)/', '$2"$3"$4',$string);
The rezulting string i'm tring to obtain in this case is having the "\state\" value unquoted, skipped by the regex, because it contains the \" ahead of :digit,
"time":"1430702635",\"id\":\"45.33\",\"state\":2,"stamp":"14.30702635",
Why is the '\state\' number replaced also ?
Tried on https://regex101.com/r/xI1zI4/1 also ..
New edit:
So from what I tried,
(?!\\")
is not working !!
If I'm allowed, I will leave this unanswered in case someone else does know why.
My solution was to use this regex, instead of NOT, I went for yes ..
$string2 = preg_replace('/(\w":)([0-9\.]+)(,)/', '$1"$2"$3',$string);
Thank you.
(?!\\") is a negative lookahead, which generally isn't useful at the very beginning of a regular expression. In your particular regex, it has no effect at all: the expression (?!(\\\"))(\:) means "empty string not followed by slash-quote, then a colon" which is equivalent to just trying to match a colon by itself.
I think what you were trying to accomplish is a negative lookbehind, which has a slightly different syntax in PCRE: (?<!\\"). Making this change seems to match what you want: https://regex101.com/r/xI1zI4/2

Using REGEX to get a sanitized string (not check whether it fits the regex)

I have used preg_match to check whether a string matches a given regular expression, but not to parse out the invalid characters. This is what I want to do, in a function named toSearchableString($string):
Passed original string: Where'd You Go (feat. Holly Brook and Jonah Matranga)
Returned sanitized string: Whered You Go feat Holly Brook and Jonah Matranga
If we have to replace the characters that are invalid with anything, a space is preferred (or an empty string to simply remove them).
This is the REGEX I want it matched against: a-zA-Z0-9 and a space
Thanks!
Use preg_replace() with a pattern that matches invalid characters as the first argument, an empty string as the second argument, and your original string as the third argument.
Have fun with this :
http://www.txt2re.com/index-php.php3
I'm using it since a lot of time, I still do not know how to code regex by myself, lazily. This helps a lot and you can tweak a bit the code after to make what you want with it.
Just saying, this help a lot sometimes.

Regular expression doesn't quite work

I have created a Regular Expression (using php) below; which must match ALL terms within the given string that contains only a-z0-9, ., _ and -.
My expression is: '~(?:\(|\s{0,},\s{0,})([a-z0-9._-]+)(?:\s{0,},\s{0,}|\))$~i'.
My target string is: ('word', word.2, a_word, another-word).
Expected terms in the results are: word.2, a_word, another-word.
I am currently getting: another-word.
My Goal
I am detecting a MySQL function from my target string, this works fine. I then want all of the fields from within that target string. It's for my own ORM.
I suppose there could be a situation where by further parenthesis are included inside this expression.
From what I can tell, you have a list of comma-separated terms and wish to find only the ones which satisfy [a-z0-9._\-]+. If so, this should be correct (it returns the correct results for your example at least):
'~(?<=[,(])\\s*([a-z0-9._-]+)\\s*(?=[,)])~i'
The main issues were:
$ at the end, which was anchoring the query to the end of the string
When matching all you continue from the end of the previous match - this means that if you match a comma/close parenthesis at the end of one match it's not there at match at the beginning of the next one. I've solved this with a lookbehind ((?<=...) and a lookahead ((?=...)
Your backslashes need to be double escaped since the first one may be stripped by PHP when parsing the string.
EDIT: Since you said in a comment that some of the terms may be strings that contain commas you will first want to run your input through this:
$input = preg_replace('~(\'([^\']+|(?<=\\\\)\')+\'|"([^"]+|(?<=\\\\)")+")~', '"STRING"', $input);
which should replace all strings with '"STRING"', which will work fine for matching the other regex.
Maybe using of regex is overkill. In this kind of text you can just remove parenthesis and explode string by comma.

Categories