How to write a PHP Regex to match string - php

I only knows basic regex, so I am look for help here.
I need to match URL with this pattern:
/kb/This-is-possible-title-12345.html
The URL will always ends with -nnnnn.html. Currently I have this regex pattern:
'kb/[a-zA-Z_-]*(\d+)\.html'
however, this does not work if the portion contains numbers, such as
/kb/This-is-12345-possible-title-12345.html
This needs to be done with PHP preg_match function.

The following works for me: /kb/[\w_-]*-(\d+)\.html$.

At a quick glance what you have looks right but you need to escape the forward slash so change '/' to '\/'.

'/kb/[^/]*-\d{5}\.html'
This matches "/kb/" "any characters except '/'" "hyphen" "5 digits" ".html"

Related

Regex in php: Compulsory second occurence of word

I need to match a few urls for an application I'm working on;
So, I've got this reference string:
content/course/32/lesson/61/content/348
and I need a pattern that matches either
content
OR
content/course/[number]/lesson/[number]/content/[number]
What I've done so far is come up with this pattern:
$my_regex = "/content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/";
which however has the following problem: This string returns a match which should otherwise not:
content/course/32/lesson/61/content
I'm thinking that it's got something to do with the word content repeating twice but I'm not entirely sure.
Any help is much appreciated.
The reason for the match is the alternation.
content\/?$
matches
content/course/32/lesson/61/content
To fix this, add a ^ (beginning of line) to the start of your regex to ensure the entire string is matched and not only the ending:
/^content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/
See it in action
this works:
/(^content\/?|content\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4})$/

PHP Regex: match text urls until space or end of string

This is the text sample:
$text = "asd dasjfd fdsfsd http://11111.com/asdasd/?s=423%423%2F gfsdf http://22222.com/asdasd/?s=423%423%2F
asdfggasd http://3333333.com/asdasd/?s=423%423%2F";
This is my regex pattern:
preg_match_all( "#http:\/\/(.*?)[\s|\n]#is", $text, $m );
That match the first two urls, but how do I match the last one? I tried adding [\s|\n|$] but that will also only match the first two urls.
Don't try to match \n (there's no line break after all!) and instead use $ (which will match to the end of the string).
Edit:
I'd love to hear why my initial idea doesn't work, so in case you know it, let me know. I'd guess because [] tries to match one character, while end of line isn't one? :)
This one will work:
preg_match_all('#http://(\S+)#is', $text, $m);
Note that you don't have to escape the / due to them not being the delimiting character, but you'd have to escape the \ as you're using double quotes (so the string is parsed). Instead I used single quotes for this.
I'm not familar with PHP, so I don't have the exact syntax, but maybe this will give you something to try. the [] means a character class so |$ will literally look for a $. I think what you'll need is another look ahead so something like this:
#http:\/\/(.*)(?=(\s|$))
I apologize if this is way off, but maybe it will give you another angle to try.
See What is the best regular expression to check if a string is a valid URL?
It has some very long regular expressions that will match all urls.

Regex match if not after word

I have a regex that's matching urls and converting them into html links.
If the url is already part of a link I don't want to to match, for example:
http://stackoverflow.com/questions/ask
Should match, but:
Stackoverflow
Shouldn't match
How can I create a regex to do this?
If your url matching regular expression is $URL then you can use the following pattern
(?<!href[\"'])$URL
In PHP you'd write
preg_match("/(?<!href[\"'])$URL/", $text, $matches);
You can use a negative lookbehind to assert that the url is not preceded by href="
(?<!href=")
(Your url-matching pattern should go immediately after that.)
This link provides information. The accepted solution is like so:
<a\s
(?:(?!href=|target=|>).)*
href="http://
(?:(?!target=|>).)*
By removing the references to "target" this should work for you.
Try this
/(?:(([^">']+|^)https?\:\/\/[^\s]+))/m

Convert PHP RegEx to JavaScript RegEx

I have a PHP regular expression I'm using to get the YouTube video code out of a URL.
I'd love to match this with a client-side regular expression in JavaScript. Can anyone tell me how to convert the following PHP regex to JavaScript?
preg_match("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=embed/)[^&\n]+|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\‌​n]+#", $url, $matches);
Much appreciated, thanks!
I think the only problem is to get rid of the lookbehind assertions (?<=...), they are not supported in Javascript.
The advantage of them is, you can use them to ensure that a pattern is before something, but they are NOT included in the match.
So, you need to remove them, means change (?<=v=)[a-zA-Z0-9-]+(?=&) to v=[a-zA-Z0-9-]+(?=&), but now your match starts with "v=".
If you just need to validate and don't need the matched part, then its fine, you are done.
But if you need the part after v= then put instead the needed pattern into a capturing group and continue working with those captured values.
v=([a-zA-Z0-9-]+)(?=&)
You will then find the matched substring in $1 for the first group, $2 for the second, $3 ...
you can replace your look behind assertion using this post
Javascript: negative lookbehind equivalent?

How to write regex to find one directory in a URL?

Here is the subject:
http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov
What I need using regex is only the bit before the last / (including that last / too)
The 937IPiztQG string may change; it will contain a-z A-Z 0-9 - _
Here's what I tried:
$code = strstr($url, '/http:\/\/www\.mysite\.com\/files\/get\/([A-Za-z0-9]+)./');
EDIT: I need to use regex because I don't actually know the URL. I have string like this...
a song
more text
oh and here goes some more blah blah
I need it to read that string and cut off filename part of the URLs.
You really don't need a regexp here. Here is a simple solution:
echo basename(dirname('http://www.mysite.com/files/get/937IPiztQG/the-blah-blah-text-i-dont-need.mov'));
// echoes "937IPiztQG"
Also, I'd like to quote Jamie Zawinski:
"Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."
This seems far too simple to use regex. Use something similar to strrpos to look for the last occurrence of the '/' character, and then use substr to trim the string.
/http:\/\/www.mysite.com\/files\/get\/([^/]+)\/
How about something like this? Which should capture anything that's not a /, 1 or more times before a /.
The greediness of regexp will assure this works fine ^.*/
The strstr() function does not use a regular expression for any of its arguments it's the wrong function for regex replacement.
Are you thinking of preg_replace()?
But a function like basename() would be more appropriate.
Try this
$ok=preg_match('#mysite\.com/files/get/([^/]*)#i',$url,$m);
if($ok) $code=$m[1];
Then give a good read to these pages
http://www.php.net/preg_match
preg_replace
Note
the use of "#" as a delimiter to avoid getting trapped into escaping too many "/"
the "i" flag making match insensitive
(allowing more liberal spellings of the MySite.com domain name)
the $m array of captured results

Categories