regex to match everything until it hits uppercase

regex to match everything until it hits uppercase - php

I found the following code from this question, regex to match everything until it finds 2 upper case characters?
^.*(?=\b(?:[^\sA-Z]*[A-Z]){2})
however my question is slightly different then the OP
I want to match everything up to the upper case in the following string,
the rules should match everything until it negative lookaround finds 2 uppercase characters and then match everything inbetween from the 1st uppercase until the start of the 2nd uppercase character
so I Want (continue from op example)
Http is an HttpHeader
is to get Http is an Http
instead of Http is an which OP is getting in posted thread

Seems overly comp. to me
preg_match( '/[^A-Z]+/', $str, $res );

preg_match('/[^A-Z]*([A-Z]{1}[^A-Z]*[A-Z]{1}[^A-Z]*)/', $str, $res);

use this pattern ^.*?(?=\b(?:[^\sA-Z]*[A-Z]){2}).+?(?=[A-Z]) Demo

([A-Z].*?\w+(?=[A-Z]))
You may follow the above regex. That's so simple and yet fast. See matched groups here: Live demo

Related

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1

So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];

Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

Non greedy match does not work

I want to implement non greedy match using .*? pattern. However, I came across one sample string which shows, that non greedy match does not work. This is the code and the sample string:
preg_match_all('/\<w:t.*?\>\<w:p\>/', '<w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/></w:rPr><w:t xml:space="preserve">Text 1 </w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/></w:rPr><w:t xml:space="preserve"></w:t></w:r><w:r><w:rPr><w:b/><w:u w:val="single"/><w:color w:val="ff0000"/><w:i/></w:rPr><w:t xml:space="preserve">Text 2</w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r><w:r><w:t xml:space="preserve"></w:t></w:r></w:p></w:t></w:r></w:p><w:p w:rsidRDefault="004D3323" w:rsidP="003F03B1"><w:r><w:t><w:p>', $match);
But if I print_r the $match variable, I see that this pattern matches the whole string. However, what I want is to match only such strings as:
"<w:t><w:p>" and "<w:t any text may go here><w:p>"
So, what I did wrong and how can I fix it? Thanks!

Use this regex instead:
<w:t[^>]*><w:p>
[^>]* allows all characters except >
see https://regex101.com/r/nuMzTk/1

Getting all URLs on multiple lines

I'm trying to get all these URLs from a website, but I only seem to be able to get the first URL. How can I match all the URLs?
So far I've tried
auto">(.*?)<\/pre>
and:
auto">(.*?)\s<\/pre>
I've tried adding several modifiers such as m and i, but it didn't seem to help.
This is what I'm searching:
auto">http://url-one.com
http://url-two.com
http://url-three.com
http://url-four.com
http://url-five.com</pre>
Can someone help me understand what I am missing?

Quick Answer
As Jonny5 hinted in his comment, . does not match newline characters by default: so (.*?) will not match beyond the first line without the s regex modifier, and his suggestion is then the quick answer:
/auto">(.*?)<\/pre>/s
You can check out his Regex101 demo or related PHP code...
$re = "/auto\">(.*?)<\\/pre>/s";
$str = "auto\">http://url-one.com\nhttp://url-two.com\nhttp://url-three.com\nhttp://url-four.com\nhttp://url-five.com</pre>";
preg_match($re, $str, $matches);
...for reference.
Digging Deeper
However, there is a little more going on here.
i and m Modifiers
First, regardless whether you use the i or m modifier(s), no line of the sample text would match with auto"> at the beginning and <\/pre> at the end of the pattern. You would have to group and follow each with a quantifier to make it optional (e.g. (?:auto">)? and (?:<\/pre>)?) to match each line of the sample text.
m Requires Matching Globally
Second, the m modifier would necessitate matching globally – and further tweaks to the pattern to avoid the last URL match ending with </pre>:
/(?:auto">)?(.+)(?=(?:\n|<\/pre>))/m
You can also check out a second Regex101 demo of this twist or try it out in PHP:
$re = "/(?:auto\">)?(.+)(?=(?:\\n|<\\/pre>))/m";
$str = "auto\">http://url-one.com\nhttp://url-two.com\nhttp://url-three.com\nhttp://url-four.com\nhttp://url-five.com</pre>";
preg_match_all($re, $str, $matches); // NOTE: preg_match_all to match globally
^^^^
Which Approach to Choose
The choice between simply adding the s modifier or tweaking the pattern, adding the m modifier, and matching globally mostly comes down to whether you want a single match with all the URLs (separated by newlines) or many matches, each with one of the URLs.
The latter yields the matches below...
MATCH 1
1. [6-24] `http://url-one.com`
MATCH 2
1. [25-43] `http://url-two.com`
MATCH 3
1. [44-64] `http://url-three.com`
MATCH 4
1. [65-84] `http://url-four.com`
MATCH 5
1. [85-104] `http://url-five.com`
...versus the single match that the original pattern and the s modifier yield:
MATCH 1
1. [6-104] `http://url-one.com
http://url-two.com
http://url-three.com
http://url-four.com
http://url-five.com`

Problems with preg_match function

I've just made a few edits to a file and when testing it seemed to not work, I did a bit of debugging and found that preg_match was returning 0, I've looked into it and cannot see what the problem is, also since I haven't touched this part of the file, I'm confused as to what might have happened...
<?php
echo preg_match('/[A-Z]+[a-z]+[0-9]+/', 'testeR123');
?>
This is a snippet I'm using for debugging, I'm guessing my pattern is wrong, but I am probably wrong about that.
Thanks,
P110

According to your comment:
I'm just looking for it to check if there is an uppercase, lowercase and a number, but from the replies, my pattern checks for it in an order
have a try with:
preg_match('/^(?=.*[A-Z])(?=.*[a-z])(?=.*[0-9])[A-Za-z0-9]+$/', $input_string);
where
(?=.*[A-Z]) checks there are at least one uppercase
(?=.*[a-z]) checks there are at least one lowercase
(?=.*[0-9]) checks there are at least one digit
[A-Za-z0-9]+ checks there are only these characters.
(?=...) is called lookahead.

The problem is the order of the letters:
Try this:
echo preg_match('/[a-z]+[A-Z]+[0-9]+/', 'testeR123');
Or:
echo preg_match('/[A-Z]+[a-z]+[0-9]+/', 'Rtest123');
Or simpler
echo preg_match('/[A-Z]+[0-9]+/i', 'testeR123');

Your regex first test if there are Capital letters from A to Z then if there are lowercase letters from at to z and then if there are numbers. since your string starts with an lowercase it will not match.
i think you want to do this
[A-Za-z0-9]+
Or if you need that your string starts with a lowecase string then an uppercase string and then numbers you should change the regex to.
[a-z]+[A-Z]+[0-9]+
In that way your current string would fit the regex as well.

<?php
preg_match('/([A-Za-z0-9]+)/', 'testeR123', $match);
echo $match[1];
?>

REGEX at last one uppercase and one number

I searched everywhere but i couldn't find the right regex for my verificaiton
I have a $string, i want to make sure it contains at last one uppercase letter and one number. no other characters allowed just numbers and letter. is for a password require.
John8 = good
joHn8 = good
jo8hN = good
I will use preg_match function
The uppercase and letter can be everywhere in the word, not only at the begging or end

This should work, but is a bit of a mess. Consider using multiple checks for readability and maintainability...
preg_match('/^[A-Za-z0-9]*([A-Z][A-Za-z0-9]*\d|\d[A-Za-z0-9]*[A-Z])[A-Za-z0-9]*$/', $password);

Use lookahead:
preg_match('/^(?=.*[A-Z])(?=.*[0-9])[a-zA-Z0-9]+$/', $string);

Use this regex pattrn
^([A-Z]+([a-z0-9]+))$
Preg_match
preg_match('~^([A-Z]+([a-z0-9]+))$~',$str);
Demo

Your requisition need "precise syntax description", and a lot of examples for assert your description. Only 3 or 4 examples is not enough, is very open.
For last confirmed update:
preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str)
History
first solution preg_match('/^[A-Z][a-z]+\d+$/',$str)
After your edit1: preg_match('/^[a-z]*[A-Z][a-z]*\d+$/',$str)
After your comment about utf8: hum... add at your question the valid language. Example: "José11" is a valid string?
After your edit2 ("jo8hN" is valid): and about number, can repeat? Well I suppose not. "8N" is valid? I suppose yes. preg_match('/^([a-z]*\d+[a-z]*[A-Z][a-z]*|[a-z]*[A-Z][a-z]*\d+[a-z]*)$/',$str) you can add more possibilities with "|" in this regex.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex to match everything until it hits uppercase - php

Seems overly comp. to me preg_match( '/[^A-Z]+/', $str, $res );

preg_match('/[^A-Z]([A-Z]{1}[^A-Z][A-Z]{1}[^A-Z]*)/', $str, $res);

use this pattern ^.?(?=\b(?:[^\sA-Z][A-Z]){2}).+?(?=[A-Z]) Demo

([A-Z].*?\w+(?=[A-Z])) You may follow the above regex. That's so simple and yet fast. See matched groups here: Live demo

Related

Regex - Match characters but don't include within results

Non greedy match does not work

Getting all URLs on multiple lines

Problems with preg_match function

REGEX at last one uppercase and one number

Categories

Resources