url matching regex for php - php

I'm looking for a pattern that can match urls.
All of them will contain ".no" as there is only Norwegian domains input.
I think whats needed is this:
search for a space or linebreak before and after '.no', and the match will be a link.
Some examples of what it should match (all with text around it):
test.no
test.no/blablabla/
test.no/blablabla/test.html
test.no/blablabla/test.php
test.no/blablabla/test.htm
and this should then be replaced with
MATCH
anyone can figure this out?

This should do it:
$html = preg_replace("#\w+\.no[\w/.-]*#", '$0', $html);

Check John Gruber's Regular expression for URLs and go on from there.

I hope this is strict enough:
^[\w\d-\.\\]+.no.*$

Related

Find all hashtags in string using preg_match_all

I'm having problems figuring out the right regex pattern for the search preg_match_all("THIS PART", $my_string). I need to find all hashtags in my string with the word after the hashtag included as well.
So, these strings should be found by the mentioned function:
Input
#hi im like typing text right here hihih #asdasdasdasd #
Result
#hi
#asasdasdasdasd
Input
#asd#asd xd so fun lol #lol
Result
#asd#asd2 would be two seperate matches and #lol would be matched aswell.
I hope the question made sense and thanks beforehand!
This should work:
/#(?<hash>[^\s#]+)/g
It searches for # and creates then a named group called hash, it stops matching after it reaches another # or after any whitespace character (\s).
You can use preg_match_all
preg_match_all('/(?<!\w)#\w+/', $your_string, $allMatches);
It will give all contain # tag word. hope it help you.
print_r($allMatches)

preg_replace regex tags not being replaced

Hoping you can help. Pretty new to regex and although I have written this regex it doesnt seem to match. I dont recieve an error message so im assuming the syntax is correct but its just not being applied?
I want the regex to replace content like
{foo}bar{/foo} with
bar
Here is my code:
$regex = "#([{].*?[}])(.*?)([{]/.*?[}])#e";
$return = preg_replace($regex,"('$2')",$return);
Hope someone can help. Not sure why it doesnt seem to work.
Thanks for reading.
Your regex does work, however it isn't smart enough to know that the end tag has to be the same as the start tag. I would use this instead. I've also simplified it a little:
$regex = '#{([^}]*)}(.*?)\{/\\1}#';
echo preg_replace('{foo}bar{/foo}', '$2', $str); // outputs "bar"
Codepad
Refering to my comment above:
#(?:[{](.*?)[}])(.*?)(?:[{]/\1[}])#
uses a backreference to keep the tags equal. Also, I used non-capture parentheses to keep the useless groups out: $1 will be the tag name, and $2 will be the tag content.
Note that you will have to apply the replacement several times if your tags can nest.

Regex match if not after word

I have a regex that's matching urls and converting them into html links.
If the url is already part of a link I don't want to to match, for example:
http://stackoverflow.com/questions/ask
Should match, but:
Stackoverflow
Shouldn't match
How can I create a regex to do this?
If your url matching regular expression is $URL then you can use the following pattern
(?<!href[\"'])$URL
In PHP you'd write
preg_match("/(?<!href[\"'])$URL/", $text, $matches);
You can use a negative lookbehind to assert that the url is not preceded by href="
(?<!href=")
(Your url-matching pattern should go immediately after that.)
This link provides information. The accepted solution is like so:
<a\s
(?:(?!href=|target=|>).)*
href="http://
(?:(?!target=|>).)*
By removing the references to "target" this should work for you.
Try this
/(?:(([^">']+|^)https?\:\/\/[^\s]+))/m

How to write regex to find one directory in a URL?

Here is the subject:
http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov
What I need using regex is only the bit before the last / (including that last / too)
The 937IPiztQG string may change; it will contain a-z A-Z 0-9 - _
Here's what I tried:
$code = strstr($url, '/http:\/\/www\.mysite\.com\/files\/get\/([A-Za-z0-9]+)./');
EDIT: I need to use regex because I don't actually know the URL. I have string like this...
a song
more text
oh and here goes some more blah blah
I need it to read that string and cut off filename part of the URLs.
You really don't need a regexp here. Here is a simple solution:
echo basename(dirname('http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov'));
// echoes "937IPiztQG"
Also, I'd like to quote Jamie Zawinski:
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
This seems far too simple to use regex. Use something similar to strrpos to look for the last occurrence of the '/' character, and then use substr to trim the string.
/http:\/\/www.mysite.com\/files\/get\/([^/]+)\/
How about something like this? Which should capture anything that's not a /, 1 or more times before a /.
The greediness of regexp will assure this works fine ^.*/
The strstr() function does not use a regular expression for any of its arguments it's the wrong function for regex replacement.
Are you thinking of preg_replace()?
But a function like basename() would be more appropriate.
Try this
$ok=preg_match('#mysite\.com/files/get/([^/]*)#i',$url,$m);
if($ok) $code=$m[1];
Then give a good read to these pages
http://www.php.net/preg_match
preg_replace
Note
the use of "#" as a delimiter to avoid getting trapped into escaping too many "/"
the "i" flag making match insensitive
(allowing more liberal spellings of the MySite.com domain name)
the $m array of captured results

regular expression to strip attributes and values from html tags

Hi Guys I'm very new to regex, can you help me with this.
I have a string like this "<input attribute='value' >" where attribute='value' could be anything and I want to get do a preg_replace to get just <input />
How do I specify a wildcard to replace any number of any characters in a srting?
like this? preg_replace("/<input.*>/",$replacement,$string);
Many thanks
What you have:
.*
will match "any character, and as many as possible.
what you mean is
[^>]+
which translates to "any character, thats not a ">", and there must be at least one
or altertaively,
.*?
which means
"any character, but only enough to make this rule work"
BUT DONT
Parsing HTML with regexps is Bad
use any of the existing html parsers, DOM librarys, anything, Just NOT NAïVE REGEX
For example:
<foo attr=">">
Will get grabbed wrongly by regex as
'<foo attr=" ' with following text of '">'
Which will lead you to this regex:
`<[a-zA-Z]+( [a-zA-Z]+=['"][^"']['"])*)> etc etc
at which point you'll discover this lovely gem:
<foo attr="'>\'\"">
and your head will explode.
( the syntax highlighter verifies my point, and incorrectly matches thinking i've ended the tag. )
Some people were close... but not 100%:
This:
preg_replace("<input[^>]*>", $replacement, $string);
should be this:
preg_replace("<input[^>]*?>", $replacement, $string);
You don't want that to be a greedy match.
preg_replace("<input[^>]*>", $replacement, $string);
// [^>] means "any character except the greater than symbol / right tag bracket"
This is really basic stuff, you should catch up with some reading. :-)
If I understand the question correctly, you have the code:
preg_replace("/<input.*>/",$replacement,$string);
and you want us to tell you what you should use for $replacement to delete what was matched by .*
You have to go about this the other way around. Use capturing groups to capture what you want to keep, and reinsert that into the replacement. E.g.:
preg_replace("/(<input).*(>)/","$1$2",$string);
Of course, you don't really need capturing groups here, as you're only reinserting literal text. Bet the above shows the technique, in case you want to do this in a situation where the tag can vary. This is a better solution:
preg_replace("/<input [^>]*>/","<input />",$string);
The negated character class is more specific than the dot. This regex will work if there are two HTML tags in the string. Your original regex won't.

Categories