I need to find a random string within a string.
My string looks as follows
{theme}pink{/theme} or {theme}red{/theme}
I need to get the text between the tags, the text may differ after each refresh.
My code looks as follows
$str = '{theme}pink{/theme}';
preg_match('/{theme}*{\/theme}/',$str,$matches);
But no luck with this.
* is only the quantifier, you need to specify what the quantifier is for. You've applied it to }, meaning there can be 0 or more '}' characters. You probably want "any character", represented by a dot.
And maybe you want to capture only the part between the {..} tags with (.*)
$str = '{theme}pink{/theme}';
preg_match('/{theme}(.*){\/theme}/',$str,$matches);
var_dump($matches);
'/{theme}(.*?){\/theme}/' or even more restrictive '/{theme}(\w*){\/theme}/' should do the job
preg_match_all('/{theme}(.*?){\/theme}/', $str, $matches);
You should use ungreedy matching here. $matches[1] will contain the contents of all matched tags as an array.
$matches = array();
$str = '{theme}pink{/theme}';
preg_match('/{([^}]+)}([^{]+){\/([^}]+)}/', $str, $matches);
var_dump($matches);
That will dump out all matches of all "tags" you may be looking for. Try it out and look at $matches and you'll see what I mean. I'm assuming you're trying to build your own rudimentary template language so this code snippet may be useful to you. If you are, I may suggest looking at something like Smarty.
In any case, you need parentheses to capture values in regular expressions. There are three captured values above:
([^}]+)
will capture the value of the opening "tag," which is theme. The [^}]+ means "one or more of any character BUT the } character, which makes this non-greedy by default.
([^{]+)
Will capture the value between the tags. In this case we want to match all characters BUT the { character.
([^}]+)
Will capture the value of the closing tag.
preg_match('/{theme}([^{]*){\/theme}/',$str,$matches);
[^{] matches any character except the opening brace to make the regex non-greedy, which is important, if you have more than one tag per string/line
Related
I'm trying to parse thru a file and find a particular match, filter it in some way, and then print that data back into the file with some of the characters removed. I've been trying different things for a couple hours with preg slits and preg replace, but my regular express knowledge is limited so I haven't made much progress.
I have a large file that has many instances like this [something]{title:value}. I want to find everything between "[" and "}" and remove everything besides the "something" bit.
After that parts done I want to find everything between "{" and "}" on everything left like {title:value} and then remove everything besides the "value" part. I'm sure there is some simple method to do this, so even just a resource on how to get started would be helpful.
Not sure if I get your meaning right (and haven't touched PHP for months), what about this?
$matches = array();
preg_match_all("/\[(.*?)\]\{.*?:(.*?)\}/", $str, $matches);
$something = $matches[1]; // $something stores all texts in the "something" part
$value = $matches[2]; // $value stores all texts in the "value" part
Doc for preg_match_all
For the regex pattern \[(.*?)\]\{.*?:(.*?)\}:
We escapes all the [, ], { and } with a slash because these characters have a special meaning in regex, and need an escape for the literal character.
.*? is a lazy match all, which will match any character until the next character matches the next token. It is used instead of .* so that it won't match other symbols
(.*?) is a capturing group, getting what we need and PHP will put those matches in $matches array
So the entire thing is - match the [ character, then any string until getting the ] character and put it in capturing group 1, then ]{ characters, then any string until getting the : character (no capturing group because we don't care.), then match the : character, then any string until the } character and put it incapturing group 2.
You can do it in one shot:
$txt = preg_replace('~\[\K[^]]*(?=])|{[^:}]+:\K[^}]+(?=})~', '', $txt);
\K removes from match result all that have been matched on his left.
The lookahead (?=...) (followed by) performs a check but add nothing to the match result.
I have two strings in PHP:
$string = '<a href="http://localhost/image1.jpeg" /></a>';
and
$string2 = '[caption id="attachment_5" align="alignnone" width="483"]<a href="http://localhost/image1.jpeg" /></a>[/caption]';
I'm trying to match strings of the first type. That is strings that are not surrounded by '[caption ... ]' and '[/caption]'. So far, I would like to use something like this:
$pattern = '/(?<!\[caption.*\])(?!\[\/caption\])(<a.*><img.*><\/a>)/';
but PHP matches out the first string as well with this pattern even though it is NOT preceeded by '[caption' and zero or more characters followed by ']'. What gives? Why is this and what's the correct pattern?
Thanks.
Variable length look-behind is not supported in PHP, so this part of your pattern is not valid:
(?<!\[caption.*\])
It should be warning you about this.
In addition, .* always matches the larges possible amount. Thus your pattern may result in a match that overlaps multiple tags. Instead, use [^>] (match anything that is not a closing bracket), because closing brackets should not occur inside the img tag.
To solve the look-behind problem, why not just check for the closing tag only? This should be sufficient (assuming the caption tags are only used in a way similar to what you have shown).
$pattern = '|(<a[^>]*><img[^>]*></a>)(?!\[/caption\])|';
When matching patterns that contain /, use another character as the pattern delimiter to avoid leaning toothpick syndrome. You can use nearly any non-alphanumeric character around the pattern.
Update: the previous regex is based on the example regex you gave, rather than the example data. If you want to match links that don't contain images, do this:
$pattern = '|(<a[^>]*>[^<]*</a>)(?!\[/caption\])|';
Note that this doesn't allow any tags in the middle of the link. If you allow tags (such as by using .*?), a regex could match something starting within the [caption] and ending elsewhere.
I don't see how your regexp could match either string, since you're looking for <a.*><img.*><\/a>, and both anchors don't contain an <img... tag. Also, the two subexpressions looking for and prohibiting the caption-bits look oddly positioned to me. Finally, you need to ensure your tag-matching bits don't act greedy, i.e. don't use .* but [^>]*.
Do you mean something like this?
$pattern = '/(<a[^>]*>(<img[^>]*>)?<\/a>)(?!\[\/caption\])/'
Test it on regex101.
Edit: Removed useless lookahead as per dan1111's suggestion and updated regex101 link.
Lookbehind doesn't allow non fixed length pattern i.e. (*,+,?), I think this /<a.*><\/a>(?!\[\/caption\])/ is enough for your requirement
what is wrong with my preg_match ?
preg_match('numVar("XYZ-(.*)");',$var,$results);
I want to get all the CONTENT from here:
numVar("XYZ-CONTENT");
Thank you for any help!
I assume this is PHP? If so there are three problems with your code.
PHP's PCRE functions require that regular expressions be formatted with a delimiter. The usual delimiter is /, but you can use any matching pair you want.
You did not escape your parentheses in your regular expression, so you're not matching a ( character but creating a RE group.
You should use non-greedy matching in your RE. Otherwise a string like numVar("XYZ-CONTENT1");numVar("XYZ-CONTENT2"); will match both, and your "content" group will be CONTENT1");numVar("XYZ-CONTENT2.
Try this:
$var = 'numVar("XYZ-CONTENT");';
preg_match('/numVar\("XYZ-(.*?)"\);/',$var,$results);
var_dump($results);
Paste your example string into http://txt2re.com and look at the PHP result.
It will show that you need to escape characters that have special meaning to the regex engine (such as the parentheses).
You should escape some chars:
preg_match('numVar\("XYZ-(.*)"\);',$var,$results);
preg_match("/XYZ\-(.+)\b/", $string, $result);
print_r($result[0]); // full matches ie XYZ-CONTENT
print_r($result[1]); // matches in the first paren set (.*)
I am trying to extract a word that matches a specific pattern from various strings.
The strings vary in length and content.
For example:
I want to extract any word that begins with jac from the following strings and populate an array with the full words:
I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.
The resulting array should be [jacket,Jack,Jacksonville]
I have been trying to use preg_match() but for some reason it won't work. Any suggestions???
$q = "jac";
$str = "jacket";
preg_match($q,$str,$matches);
print $matches[1];
This returns null :S. I dunno what the problem is.
You can use preg_match as:
preg_match("/\b(jac.+?)\b/i", $string, $matches);
See it
You've got to read the manual a few hundred times and it will eventually come to you.
Otherwise, what you're trying to capture can be expressed as "look for 'jac' followed by 0 or more letters* and make sure it's not preceded by a letter" which gives you: /(?<!\\w)(jac\\w*)/i
Here's an example with preg_match_all() so that you can capture all the occurences of the pattern, not just the first:
$q = "/(?<!\\w)(jac\\w*)/i";
$str = "I bought a jacket yesterday.
Jack is going home.
I want to go to Jacksonville.";
preg_match_all($q,$str,$matches);
print_r($matches[1]);
Note: by "letter" I mean any "word character." Officially, it includes numbers and other "word characters." Depending on the exact circumstances, one may prefer \w (word character) or \b (word boundary.)
You can include extra characters by using a character class. For instance, in order to match any word character as well as single quotes, you can use [\w'] and your regexp becomes:
$q = "/(?<!\\w)(jac[\\w']*)/i";
Alternatively, you can add an optional 's to your existing pattern, so that you capture "jac" followed by any number of word characters optionally followed by "'s"
$q = "/(?<!\\w)(jac\\w*(?:'s)?)/i";
Here, the ?: inside the parentheses means that you don't actually need to capture their content (because they're already inside a pair of parentheses, it's unnecessary), and the ? after the parentheses means that the match is optional.
I have a script where I need to get three parts out of a text string, and return them in an array. After a couple of trying and failing I couldn't get it to work.
The text strings can look like this:
Some place
Some place (often text in parenthesis)
Some place (often text in parenthesis) [even text in brackets sometimes]
I need to split these strings into three:
{Some place} ({often text in parenthesis}) [{even text i brackets sometimes}]
Which should return:
1: Some place
2: often text in parenthesis
3: even text in brackets sometimes
I know this should be an easy task, but I couldn't solve the correct regular expression. This is to be used in PHP.
Thanks in advance!
Try something like this:
$result = preg_match('/
^ ([^(]+?)
(\s* \( ([^)]++) \))?
(\s* \[ ([^\]]++) \])?
\s*
$/x', $mystring, $matches);
print_r($matches);
Note that in this example, you will probably be most interested in $matches[1], $matches[3], and $matches[5].
Split the problem into three regular expressions. After the first one, where you get every character before the first parenthesis, save your position - the same as the length of the string you just extracted.
Then in step two, do the same, but grab everything up to the closing parenthesis. (Nested parentheses make this a little more complicated but not too much.) Again, save a pointer to the end of the second string.
Getting the third string is then trivial.
I'd probably do it as three regular expressions, starting with both parenthesis and brackets, and falling back to less items if that fails.
^(.*?)\s+\((.*?)\)\s+\[(.*?)\]\s+$
if it fails then try:
^(.*?)\s+\((.*?)\)\s+$
if that also fails try:
^\s+(.*?)\s+$
I'm sure they can be combined into one regular expression, but I wouldn't try.
Something like this?
([^(]++)(?: \(([^)]++)\))?(?: \[([^\]]++)\])?