help with regex match - php

i want to retrieve following urls with a regex:
HREF="http://www.getty.edu/vow/TGNFullDisplay?find=&place=&nation=&english=Y&subjectid=7009830"
HREF="http://www.getty.edu/vow/TGNFullDisplay?find=&place=&nation=&english=Y&subjectid=7009830&ptype=PF"
the difference is the ending. the first one omits the &ptype=PF and the last one includes it.
at the moment im using this pattern:
protected $uriPattern = '/http:\/\/www\.getty\.edu\/vow\/.*?\?find=&place=&nation=&english=Y&subjectid=......./i';
but that works only for the first one.
i wonder how the regex pattern would look like for the preg_match_all to match both of them. thanks for help.

If there is an optional part in the strings you are matching, you can add (optional)?, in your case (&ptype=PF)?.

Try this
protected $uriPattern = '/http:\/\/www\.getty\.edu\/vow\/.*?\?find=&place=&nation=&english=Y&subjectid=.......(&ptype=PF){0,1}/i';

I was going to suggest the more succinct
"/http://www\.getty\.edu/vow/TGNFullDisplay\?find=&place=&nation=&english=Y&subjectid=.+(&ptype=PF)?/i"
The forward slashes are not special in either PHP nor RegEx, and thus do not need to be escaped, and the ID could be a different length.

Related

URL routing regex

I'm trying to create a snippet of regex that will match a URL route.
Basically, if I have this route /users/:id I want /users/100 to match, but /users/100/edit not to match.
This is what I'm using now: users/(.*)/ but because of the greedy match it's matching regardless of what's after the user ID. I need some way of "breaking" the match if there's an /edit or something else on the end of the route.
I've looked into the Regex NOT operator but with no luck.
Any advice?
Are you just trying to collect digits?
You could use users/(\d*)/
And this one is how you would do it if you wanted to collect everything until a /, and it uses a NOT, ^/users/[^/]*$
You can use negative lookahead:
users/(.*)/(?!edit)
This will always require a trailing slash however. Maybe a better solution would be:
users/(\d+)(?!/edit)
See this post for more information.

help with a regex code

i have this regex code
/^(https?:\/\/+[\w\-]+\.[\w\-]+)/i
it works but there is a problem
you NEED http:// in the url for it to validate, and what i am making, the user will not want to add http:// to the url they want to just have example.com, if its possible i need it to work weather it has http:// or not
i don't know how to make my own regex, and ive searched but cannot find a one that does what i need, unless im just not looking in the right place. (Google :P)
Don't bother with regex. Use parse_url function.
You can just make it optional
/^((?:https?:\/\/+)?[\w\-]+\.[\w\-]+)/i
The (?:) around the part you don't want to have is a non capturing group, the ? afterwards makes it optional.
I'm not sure what the + after the second slash is good for, it says at least one of the preceding character. That means it allows also stuff like http://////////.
I hope you are aware, that this regex is far from matching valid URLs.
For example it will match stuff like
http://////////------------.-
or at least
http://N.O
^ after this position you can write what you want and it will match valid.
Here on Regexr you can see what your regex is matching.
See Purple Coder's answer for a probably better solution.
/^((https?:\/\/+)?[\w-]+.[\w-]+)/i
I'm using this :
// Validate that the string contains at least a dot .
var filterWebsite = /^([a-zA-Z0-9:_\.\-/])+\.([a-zA-Z0-9_\.\-/])+$/;

REGEX: Match at the beginning of a string OPTIONALLY

Im building a regex to match the word combo W7. Not W73 or NW7 or 2W7.
So far I have
^w7{1}\b
which works perfectly. However, I have a problem.
I also need to have //W7 (with 2 forward slashs) also match. So if W7 or //W7 are entered they should match
Any ideas?
Thanks!
Just add an optional // at the start.
^(//)?w7\b
You may need to escape them.
^(\/\/)?w7\b
You could just add an optional group to your regex
^(?://)?W7\b
Remember to use a non-/ delimiter (it's tidier than escaping those slashes).
If you want the subject string to only ever contain //W7 or W7 then an alternative (full pattern) would be:
~^(?://)?W7$~D
What about ^(//)?W7? the question mark indicates one or zero occurrences.

Need php regex between 2 sets of chars

I need a regular expression for php that outputs everything between <!--:en--> and <!--:-->.
So for <!--:en-->STRING<!--:--> it would output just STRING.
EDIT: oh and the following <!--:--> nedds to be the first one after <!--:en--> becouse there are more in the text..
The one you want is actually not too complicated:
/<!--:en-->(.*?)<!--:-->/gi
Your matches will be in capture group 1.
Explanation:
The .*? is a lazy quantifier. Basically, it means "keep matching until you find the shortest string that will still fit this pattern." This is what will cause the matching to stop at the first instance of <!--:-->, rather than sucking up everything until the last <!--:--> in the document.
Usage is something like preg_match("/<!--:en-->(.*?)<!--:-->/gi", $input) if I recall my PHP correctly.
If you have just that input
$input = '<!--:en-->STRING<!--:-->';
You can try with
$output = strip_tags($input);
Try:
^< !--:en-- >(.*)< !--:-- >$
I don't think any of the other characters need to be escaped.
<!--:en--\b[^>]*>(.*?)<!--:-->
This will match the things between your tags. This will break if you nest your tags, but you didnt say you were doing that :)

How do I modify this regex to validate URIs with empty parameter value in querystring?

I'm using this code to validate URIs in php:
preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $uri)
However, this won't pass for URIs that end with a equals sign.
e.g. http://example.com?query=fish&offset=10 returns true, http://example.com?query=fish&offset= doesn't.
I can't see why this should be the case from the regex as it allows all characters following the ? sign.
Any tips?
Thanks,
Chris
Why don't you use filter_var? ;)
Your RegEx isn't working as you anticipate.
Your second group (.[a-z0-9-]+)* is capturing EVERYTHING past http://e. However, it requires that there are at least 2 characters to work, and since it's greedy, it will capture as much as it possibly can.
Try this instead:
^http(s)?://[a-z0-9-]+\.[a-z0-9-]+(\.[a-z0-9-]+)?(/[-a-z0-9=?&/]*)?$
If need be, change the last capturing group to include any characters you might need to include in your query string or URI.

Categories