Matching html input values with regex - php

I am trying to match a string like the following:
<input type="text" value="cbyEOS56RK3lOxtiCrhmWSkDuNWwrFN4" name="iden">
This is my code:
$pattern = '~value="(.+?)" name="iden"~';
preg_match($pattern, $page, $match);
print_r($match);
As you can probably see, I am trying to match the value in this HTML input. By what I know of regular expressions, .* will match as many characters as possible until it satisfies the next token (in this case ").
I have the name="iden" part in my regex because there are other HTML inputs on the page and I only want to match this one.
Problem is, I'm not getting any matches at all. $match is an empty array. And I know that $page has the right content because I can see it when I echo it.
Help fixing my regex is appreciated, thanks.

Since you used the phrase "there are other inputs on the page", I assume you're trying to parse out this particular tag from a full HTML document. In that case, I recommend using a DOM parser rather than regular expressions (I'm not trying to be facetious with that link, there's just a lot of options so that seemed easiest). They are designed specifically for this purpose and will be a lot easier in the end.
If you want to try regex anyway, I would personally use ([^"]+) instead of (.+?):
$pattern = '~value="([^"]+)" name="iden"~';
Though this still doesn't address whatever is causing your problem, as your regex should match on that line.

Related

preg_match link text with less-than sign in it

I'm trying to get information in DB from html files, and suddenly found that link can be like:
channel crosstalk: <60dB
there for my regular expression doesn't find that link:
preg_match_all('|<a href="/blabla/([0-9]+)"[^>]*>([^<]*)</a>|Uis',$html,$matches);
This is a part of big regular expression, I just simplified it for example.
It's hard to tell what you are trying to pull. Are you looking for the entire link? Or are you looking to grab parts from the link (hence the parenthesis)? Here is a solution for getting the individual contents in the link:
preg_match_all( '#(.*?)#i', $html, $matches);
The first element of matches will be the entire link, while the other elements will be the sub parts.
Or here is one for just the entire link:
preg_match_all( "#(<a.*>.*</a>)#i", $html, $matches );
Or here is a slightly modified version of yours which currently isn't matching because it's saying to match anything that is not an angle bracket inside the opening and closing A tags as its contents has an angle bracket:
preg_match_all( '|<a href="/blabla/([0-9]+)"[^>]*>(.*?)</a>|Uis', $html, $matches );
Again, not 100% sure the exact results you are looking for, but maybe this will get your going and you can make modifications as needed.
You can use this regex to extract href and link text.
<a[^>]+?href="(.*?)"[^>]+?>(.*?)</a>
Group 1: href
Group 2: link text
This is the fundamental issue with trying to regex HTML. This is not really good HTML - because contents that are not meant to be interpreted as HTML should be html entities (aka &lte; instead of <). You won't always be able to handle that though.
In your case, something like this works for regex:
|.*?|Uis
The matching group gets shifted. This also allows nested tags (like <a><b><i></i></b></a>).
Keep in mind that the Ungreedy tag you used means that you can be a little more lax in your regex matching. If you wanted to do this without the U modifier you'd maybe need to do some negative lookaheads.
|(?:(?!).)*</a>|is

Add another regular expression to an existing expression

I'm not familair with regular expressions. I'm trying to understand it, but it's difficult.
I've got a regular expression which will wrap any URL in an anchor tag. However, it's also wrapping URLs which are already in an anchor tag. I would like to prevent that, so I found a regular expression which does this for me.
?![^<]*</a>
However, I have no idea how I would add this to my existing regular expression. This is my current regular expression:
preg_replace('!(((ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text); ?>
So, how can I skip an URL that is already wrapped in an anchor tag?
I'm gonna join the choir and say: Don't use regex for this - use a html parser.
This said - the regex you found isn't really a regex in itself. It's part of a negative look-ahead that kind of checks you aren't in an anchor. (It should really be (?![^<]*</a>).) It checks that following text up to the next < (or the end) isn't followed by </>.
Appending this to the en of your original RE will sometimes do the trick. I won't spend time thinking of situations it'll fail - but it probably will.
Along with some simplifications your regex should look like this:
(https?:\/\/[-\wа-яА-Я()#:%+.~#?&;\/=]+)(?![^<]*<\/a>)
This probably will work for you mostly, but probably will fail at times as well.
Regards

regular expressions checking two strings

Hi wonder if anyone can help - I'm trying to check for occurance of one of two possible strings using regex - but my knowlege of regex is very limited, so I'm not having much sucess.
I'm trying to look for 'Email' and 'eMailConfirm', this is what I have so far and is working for Email
subject is the id of a input field, so it could be 'name','Email','eMailConfirm'
$subject = $getPromoOuter['label'];
$pattern = '/^Email/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 0);
I tried a number of potential expressions to try and incorporate the second string but I can't seem to get it to play (plus a few guesswork ones based on others)
any idea how I can concatenate those two strings and check for an occurance of either?
Thanks for looking
I'll just place an answer here, as I do think I have a good idea what your requirement is.
Your current regex is /^Email/ which matches any string which starts with 'Email'. (whether or not it has to start with it is unclear to me).
In case you need to match either Email or eMailConfirm, not at the start of the string, you should go for
/Email|eMailConfirm/
If the matches do need to be at the front of the string, just prepend both with a '^' character:/^Email|^eMailConfirm/

Regex - Grab a specific word within specific tags

I don't consider myself a PHP "noob", but regular expressions are still new to me.
I'm doing a CURL where I receive a list of comments. Every comment has this HTML structure:
<div class="comment-text">the comment</div>
What I want is simple: I want to get, from a preg_match_all, the comments that have the word "cool" in this specific DIV tag.
What I have so far:
preg_match_all("#<div class=\"comment-text\">\bcool\b</div>#Uis", $getcommentlist, $matchescomment);
Sadly, this doesn't work. But if the REGEX is simply #\bcool\b#Uis, it will work. But I really want to capture the word "cool" in those tags.
I know I could do 2 regular expressions (one that gets all the comments, the other that filters each of them to capture the word "cool"), but I was wondering how could I do this in one preg_match_all?
I don't think I'm far from the solution, but somehow I just can't find it. Something's definitely missing.
Thank you for your time.
This should give you what you're looking for, and provide some flexibility if you want to change things a bit:
$input = '<div class="comment-text">the comment</div><div class="comment-text">cool</div><div class="comment-text">this one is cool too</div><div class="comment-text">ool</div>';
$class="comment-text";
$text="cool";
$pattern = '#<div class="'.$class.'">([^<]*'.$text.'[^<]*)</div>#s';
preg_match_all($pattern, $input, $matches);
Obviously, you need to set your input as the value for $input. After this runs, an array of the <div>s that matched will be in $matches[0] and an array of the text that matched will be in $matches[1]
You can change the class of div to match or the within-div text to require by changing the $class and $text values, respectively.

preg_match returning weird results

I am searching a string for urls...and my preg_match is giving me an incorrect amount of matches for my demo string.
String:
Hey there, come check out my site at www.example.com
Function:
preg_match("#(^|[\n ])([\w]+?://[\w]+[^ \"\n\r\t<]*)#ise", $string, $links);
echo count($links);
The result comes out as 3.
Can anybody help me solve this? I'm new to REGEX.
$links is the array of sub matches:
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
The matches of the two groups plus the match of the full regular expression results in three array items.
Maybe you rather want all matches using preg_match_all.
If you use preg_match_pattern, (as Gumbo suggested), please note that if you run your regex against this string, it will both match the value of your anchor attribute "href" as well as the linked Text which in this case happens to comtain an url. This makes TWO matches.
It would be wise to run an array_unique on your resultset :)
In addition to the advice on how to use preg_match, I believe there is something seriously wrong with the regular expression you are using. You may want to trying something like this instead:
preg_match("_([a-zA-Z]+://)?([0-9a-zA-Z$-\_.+!*'(),]+\.)?([0-9a-zA-Z]+)+\.([a-zA-Z]+)_", $string, $links);
This should handle most cases (although it wouldn't work if there was a query string after the top-level domain). In the future, when writing regular expressions, I recommend the following web-sites to help: http://www.regular-expressions.info/ and especially http://regexpal.com/ for testing them as you're writing them.

Categories