the right regex for a subdomain - php

I have a webapp, where people signup and get a sub domain under this app domain ( xx.app.com ) ... for each subdomain there is a db that is attached to it grammatically and have the same name as the subdomain.
what i need is the right regex that works with the subdomain and off course a db name ( mysql if it matters ), it's supposed to be lowercase & the length between 6 and 20 and the only allowed character is the " - ", also numbers are banned ...
i tried many times but it always go bad, .. some like : /([a-z-]){6,20}/
Thanks in advance :)

There might be a right regex for this, but regex isn't right for this.
Try parse_url
Edit:
I am not sure how you are using it. If you are only processing the subdomain part, the following should work and not match numbers:
^[a-z-]{6,20}$
This ensures that the subdomain has only a to z and - and between 6 and 20 times. The ^ matches the beginning of the string and $ matches the end.
The reason the earlier regex was accepting numbers or anything else too was because the match itself would have been a part of the string. Now with the ^ and $ you are ensuring that it is the entire string.

This would be a safer regex, since a subdomain cannot start with an hyphen:
^[a-z][a-z-]{5,19}$
As for the database name I believe it cannot contain an hyphen since it is the subtraction operator, so your best choice might be to either disallow hypens or replace them with underscores:
$database = str_replace('-', '_', $subdomain);
EDIT: Apparently #nikic is right, you can use hyphens as long as you backtick the database name.

Have you tried escaping the hyphen?
/([a-z\-]){6,20}/

You will need positive lookahead regex for this. Try following code:
<?php
$a = array("xx-yyy.domain.cam", "xx4yyy.domain.cam", "abcde.domain.com", "my-sub-domain.domain.org");
foreach ($a as $v) {
echo "For domain $v: ";
preg_match('/^(?:[-a-z]{6,20})(?=\.)/', $v, $m );
if (count($m) > 0)
echo( "subdomain: " . $m[0] . "\n");
else
echo "subdomain not matched\n";
}
?>
Basically match combination of lower case alphabets and hyphen - character of 6 to 20 character length before appearance of first dot . character.
- hyphen need not be escaped if used at the start in square brackets.
OUTPUT
For domain xx-yyy.domain.cam: subdomain: xx-yyy
For domain xx4yyy.domain.cam: subdomain not matched
For domain abcde.domain.com: subdomain not matched
For domain my-sub-domain.domain.org: subdomain: my-sub-domain

Related

Regex - Match characters but don't include within results

I have got the following Regex, which ALMOST works...
(?:^https?:\/\/)(?:www|[a-z]+)\.([^.]+)
I need the result to be the only result, or within the same position in the Array.
So for example this http://m.facebook.com/ matches perfect, there is only 1 group.
However, if I change it to http://facebook.com/ then I get com/in place of where Facebook should be. So I need to have (?:www|[a-z]+) as an optional check really.
Edit:
What I expect is just to match facebook, if ANY of the strings are as follows:
http://www.facebook.com
http://facebook.com
http://m.facebook.com
And obviously the https counterparts.
This is my Regex now
(?:^https?:\/\/)(?:www)?\.?([^.]+)
This is close, however it matches the m on when I try `http://m.facebook.com
https://regex101.com/r/GDapY5/1
So I need to have (?:www|[a-z]+) as an optional check really.
A ? at the end of a pattern is generally used for "optional" bits -- it means "match zero or one" of that thing, so your subpattern would be something like this:
(?:www|[a-z]+)?
If you're simply trying to get the second level domain, I wouldn't bother with regex, because you'll be constantly adjusting it to handle special cases you come across. Just split on dots and take the penultimate value:
$domain = array_reverse(explode('.', parse_url($str)['host']))[1];
Or:
$domain = array_reverse(explode('.', parse_url($str, PHP_URL_HOST)))[1];
Perhaps you could make the first m. part optional with (?:\w+\.)?.
Instead of a capturing group you could use \K to reset the starting point of the reported match.
Then match one or more word characters \w+ and use a positive lookahead to assert that what follows is a dot (?=\.)
For example:
^https?://(?:www)?(?:\w+\.)?\K\w+(?=\.)
Edit: Or you could match for m. or www. using an alternation:
^https?://(?:m\.|www\.)?\K\w+(?=\.)
Demo Php

preg_match in loop returning impossible results

I'm sure I'm missing something. I know just enough to be dangerous.
In my php code I use file_get_contents() to put a file into a variable.
I then loop through an array and use preg_match to search the same variable many times. The file is a tab-delimited txt file. It does fine 800 times but one time randomly in the middle it does something very odd.
$current = file_get_contents($file);
foreach($blahs as $blah){
$image = 'somefile.jpg';
$pattern = '/https:\/\/www\.example\.com\/media(.*)\/' . preg_quote($image) . '/';
preg_match($pattern, $current, $matches);
echo $matches[0];
}
For some reason that one time it turns two URL's with a tab between them. When I look at the txt file the image i'm looking for is listed first then followed by the second iamge but echo $matches[0] returns it in reverse order. it does not exist like echo $matches[0] returns it. It would be like if you searched the string 'one two' and $matches returned 'two one'.
The regex engine is trying to do you a favor and capture the longest match. The \t tab between the two urls is being matched by the . (dot / any character).
Demonstration: (Link)
$blah='test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text';
$image = 'fish.jpg';
$your_pattern = '/https:\/\/www\.example\.com\/media(.*)\/'.preg_quote($image).'/';
echo preg_match($your_pattern,$blah,$matches)?$matches[0]:'fail';
echo "\n----\n";
$my_pattern='~https://www\.example\.com/media(?:[^/\s]*/)+'.preg_quote($image).'~';
echo preg_match($my_pattern,$blah,$out)?$out[0]:'fail';
Output:
https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg
----
https://www.example.com/media/cat/fish.jpg
To crystallize...
test case: https://www.example.com/media/foo/bar.jpg https://www.example.com/media/cat/fish.jpg some text
// your (.*) is matching ---------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
My suggested pattern (I may be able to refine the pattern if you provide smoe sample strings) uses (?:[^/\s]*/)+ instead of the (.*).
My non-capturing group breaks down like this:
(?: #start non-capturing group
[^/\s]* #greedily match zero or more non-slash, non-whitespace characters
/ #match a slash
) #end non-capturing group
+ #allow the group to repeat one or more times
*note1: You can use \t where I use \s if you want to be more literal, I am using \s because a valid url shouldn't contain a space anyhow. You may make this adjustment in your project without any loss of accuracy.
*note2: Notice that I changed the pattern delimiters to ~ so that / doesn't need to be escaped inside the pattern.

echo all urls in a string by domain name? php

trying to extract all urls by domain names/
any url beginging in
http://reports.example.com/report?
https://reports.example.com/report?
the string contains
$string = "http://reports.example.com/report?id=randomtext afdf sadfsdf https://reports.example.com/report?id=randomtext sdfsd sdf afa geadg";
i assume preg_match_all would work?
$urls = preg_match_all(~http://reports.example.com/reportid=~|https://reports.example.com/report?id=);
i tried this not working, just getting the http ID varible, (urls end in space to separate them)
preg_match_all("/reports.example.com/main(.*?) \"/is", $contents,
$matches);
foreach ($matches[1] as $url)
{
echo $url. "<br />\n";
}
You only have a delimiter at the start and end of the regex in PHP.
You can make the protocol secure or insecure by just make the s optional with a ?.
A . is a special character and should be escaped when meant to be literal (although it's probably pretty rare you'd run into a URL off by 1 character).
A ? is also a special character and has a similar scenario although in this case you wouldn't get a match because a ? only makes the preceding character/group optional (it wouldn't match itself as the . would).
Try:
https?://reports\.example\.com/report\?id=[a-z0-9A-Z]+
Demo: https://regex101.com/r/Eq6Lea/1/
This also assumes that the id parameter will only have alphanumerical characters, if others are allowed add them to that character class. This also assumes the URLs only have an id parameter, and it is always present.

Regular expression to match single dot but not two dots?

Trying to create a regex pattern for email address check. That will allow a dot (.) but not if there are more than one next to each other.
Should match:
test.test#test.com
Should not match:
test..test#test.com
Now I know there are thousands of examples on internet for e-mail matching, so please don't post me links with complete solutions, I'm trying to learn here.
Actually the part that interests me the most is just the local part:
test.test that should match and test..test that should not match.
Thanks for helping out.
You may allow any number of [^\.] (any character except a dot) and [^\.])\.[^\.] (a dot enclosed by two non-dots) by using a disjunction (the pipe symbol |) between them and putting the whole thing with * (any number of those) between ^ and $ so that the entire string consists of those. Here's the code:
$s1 = "test.test#test.com";
$s2 = "test..test#test.com";
$pattern = '/^([^\.]|([^\.])\.[^\.])*$/';
echo "$s1: ", preg_match($pattern, $s1),"<p>","$s2: ", preg_match($pattern, $s2);
Yields:
test.test#test.com: 1
test..test#test.com: 0
This seams more logical to me:
/[^.]([\.])[^.]/
And it's simple. The look-ahead & look-behinds are indeed useful because they don't capture values. But in this case the capture group is only around the middle dot.
strpos($input,'..') === false
strpos function is more simple, if `$input' has not '..' your test is success.
To answer the question in the title, I'd update the RegExp by Junuxx and allow dots in the beginning and end of the string:
'/^\.?([^\.]|([^\.]\.))*$/'
which is optional . in the beginning followed by any number of non-. or [non-. followed by .].
^([^.]+\.?)+#$
That should do for the what comes before the #, I'll leave the rest for you.
Note that you should optimise it more to avoid other strange character setups, but this seems sufficient in answering what interests you
Don't forget the ^ and $ like I first did :(
Also forgot to slash the . - silly me

How can I make a group optional in preg_replace?

I'm trying to replace:
*facebook.com/
with http://graph.facebook.com/
I need to be able to group anything before the facebook.com part into an optional group.
I can't just replace facebook.com with graph.facebook.com because the incoming URL may contain https.
Here's what I have but misses anything that doesn't have http[s]://.
<?php
$fb_url = preg_replace('/http[s]*:\/\/[www.]*facebook.com\//', 'http://graph.facebook.com/', 'facebook.com/some/segments');
echo $fb_url;
?>
Addressing your question specifically:
You can make any single character (or a group of characters) optional by adding a ? after it in your regex.
A couple of tips from looking at your code:
If you are matching strings containing / characters, simplify your life by using a different delimiter (for example #). You aren't required to use a forward slash.
You should escape the . dot metacharacter because it matches ANY single character, so your expression www. could conceivably match www9 or anything else along those lines
Also, the brackets [...] are for matching a range of characters. If you want to match specifically the text www. you should use a non-captured group like (?:www\.) and make it optional by adding the ? after it like (?:www\.)?
So, those tips in mind, try ...
<?php
$p = '#(?:https?://(?:www\.)?)?facebook\.com/#';
$r = 'http://graph.facebook.com/';
$subject = 'facebook.com/some/segments';
$fb_url = preg_replace($p, $r, $subject);
echo $fb_url; // outputs: http://graph.facebook.com/some/segments
?>
use something like below
(optional-regex-here)?

Categories