Regex: retrieve from URL everything between www. and .com

Regex: retrieve from URL everything between www. and .com - php

I am trying to use PHP's preg_match() to retrieve everything between the www. and .com of a URL.
e.g.:
www.example.com will return example
www.example-website.com will return example-website
I'm lucky in that the URLs I'm working with always start www. and always end .com, so it doesn't need to be particularly complex, accounting for many use cases.
However, my Regex knowledge is minimal to none.
My try:
preg_match("/.([^.]*)./", $string, $matches);
As according to RegExr the second match ($matches[1]?) should contain what I need, but it doesn't seem to be working.
Thanks.

(?<=www\.)(.+?)(?=\.com)
Try this.Grab the capture.See demo.
http://regex101.com/r/iZ9sO5/10

You need to escape the dots in the regex.
preg_match("/www\.([^.]*)\.com/", $string, $matches);
. in a regex can match (nearly) any character,
where as
\. matches only the literal . dot within the url.
www and com can be used for delimiting the string in the url which gives extra safety.
Example : http://regex101.com/r/aA5eC5/2
The first capture group (\1) will contain
example
example-website
EDIT
If the regex is to match strings with other . in it, something like www.example.somesite.com, then the regex can be modified as
preg_match("/www\.(.+)\.com/", $string, $matches);

Related

how i can add "/" to preg_match in php?

I have this code:
preg_match("/[^-+*%0-9]+/", $your_string, $matches)
It works great but I would like to be able to add the "/" character and I don't know how.

You can use a different pattern delimiter, such as a hash, instead of a forward slash, and then just match the forward slash like any other character:
preg_match('#^/#', $subject);

Such expressions needs to escaped, so you can use it like the following:
preg_match("/[^-+\/*%0-9]+/", $your_string, $matches)

How to exclude a word or string from an URL - Regex

I'm using the following Regex to match all types of URL in PHP (It works very well):
$reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
But now, I want to exclude Youtube, youtu.be and Vimeo URLs:
I'm doing something like this after researching, but it is not working:
$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.
Any help would be gratefully appreciated, thanks.

socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:
^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))
Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:
^(?!.*?(?:youtube|google))\w+
As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+
In your case, let's add a negative lookahead to your initial regex (which I have not tuned):
$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";
I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.
I am certain you can apply this recipe to your expression and other regexes in the future.
Reference
Regex lookarounds
StackOverflow regex FAQ

PHP Regex, get multiple value with preg_match

I have this text string:
$text="::tower_unit::7::/tower_unit::<br/>::tower_unit::8::/tower_unit::<br/>::tower_unit::9::/tower_unit::";
Now I want to get the value of 7,8, and 9
how to do that in preg_match_all ?
I've tried this:
$pattern="/::tower_unit::(.*)::\/tower_unit::/i";
preg_match($pattern,$text,$matches);
print_r($matches);
but it still all wrong...

You forgot to escape the slash in your pattern. Since your pattern includes slashes, it's easier to use a different regex delimiter, as suggested in the comments:
$pattern="#::tower_unit::(\d+)::/tower_unit::#";
preg_match_all($pattern,$text,$matches);
I also converted (.*) to (\d+), which is better if the token you're looking for will always be a number. Plus, you might want to lose the i modifier if the text is always lower cased.

Your regex is "greedy".
Use the following one
$pattern="#::tower_unit::(.*?)::/tower_unit::#i";
or
$pattern="#::tower_unit::(.*)::/tower_unit::#iU";
and, if you wish, \d+ instead of .*? or .*
the function should be preg_match_all

Replacing multiple slashes with exception in regex

There are quite a few questions on removing multiple slashes using regex in PHP. However, I have a special case I would like to exclude.
I have a full URL as my input: http://localhost/path/to/whatever
I have written to regex to convert backslashes to forward slashes, and then remove multiple consecutive slashes:
$cleaned = preg_replace('/(\\\+)|(\/+)/', "/", trim($input));
This works fine for the most part, however I need to be able to exclude the :// case, otherwise using that expression will result in which is not the intended result:
http:/localhost/path/to/whatever
I have tried using /(\\\+)|^[:](\/+)/, but this doesn't seem to work.
How can I exclude the :// case in my expression?

$cleaned = preg_replace('~(?<!https:|http:)[/\\\\]+~', "/", trim($input));
The subexpression inside the lookbehind can't use quantifiers, so the obvious approach - (?<!https?:) - won't work. But it can be made up of two or more fixed-length alternatives with different lengths. For example:
(?<!https:|http:) # OK
Be aware that the alternation has to be at the top level of the lookbehind, so this won't work:
(?<!(https:|http:)) # error

There is something called "negative look behind" (also available in positive or look ahead)
http://www.phpro.org/tutorials/Introduction-to-PHP-Regex.html
With this you could add an exception by something like
(?<=^https?:)
Then your expression will only match in places NOT preceded by "http:"

Simply a negative look-behind for a colon, preceding two or more forward or backward slashes:
$cleaned = preg_replace('/(?<!:)(?:\\/|\\\\){2,}/', "/", trim($input));

correction required for regular expression to get site name

Problem: Extraction anything between http://www. and .com OR http:// & .com.
Solution:
<?php
$url1='http://www.examplehotel.com';
//$url2='http://test-hotel-1.com';
$pattern='#^http://([^/]+).com#i';
preg_match($pattern, $url1, $matches);
print_r($matches);
?>
When $url1 is matched it should return string 'examplehotel'
when $url2 is matched it should return string 'test-hotel-1'
It works correctly for $url2 but empty for $url1....
In my pattern I want to add [http://] or [http://www.] I added (http://)+(www.)+ but the match returns are not expected :(.
May I know where I am going wrong?

try this one:
$pattern='#^http://(?:www\.)?([^\.]+).com#i';
or in your pattern you just need to make www optional (may or may not appear in pattern):
$pattern='#^http://(?:www\.)?([^/]+).com#i';

The problem is, that you are matching everything from the two slashes to the .com. If there is a www. you are matching this too, within your capturing group.
The solution is to match www. optionally before your capturing group, like this
^http://(?:www\.)?([^/]+)\.com
^^^^^^^^^^ ^^
(?:www\.)? This is a non capturing group, i.e. the content is not stored in the result. The ? at the end makes it optional.
\. will match a literal ".". . is a special character in regex and means "Any character".
See it here online on Regexr, When you hover your mouse over the strings, you will see the content of the capturing group.
Regarding your tries with [http://] and so on. When you use square brackets, then you are creating a character class, that means match one of the characters from inside the brackets. When you want to group the characters, then use a capturing () or a non capturing (?:) group.

preg_match_all('%http(?:s)?://(?:www\.)?(.*?)\.com%i', $url, $result, PREG_PATTERN_ORDER);
print_r($result[1])

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex: retrieve from URL everything between www. and .com - php

(?<=www\.)(.+?)(?=\.com) Try this.Grab the capture.See demo. http://regex101.com/r/iZ9sO5/10

Related

how i can add "/" to preg_match in php?

How to exclude a word or string from an URL - Regex

PHP Regex, get multiple value with preg_match

Replacing multiple slashes with exception in regex

correction required for regular expression to get site name

Categories

Resources