Regex in php: Compulsory second occurence of word - php

I need to match a few urls for an application I'm working on;
So, I've got this reference string:
content/course/32/lesson/61/content/348
and I need a pattern that matches either
content
OR
content/course/[number]/lesson/[number]/content/[number]
What I've done so far is come up with this pattern:
$my_regex = "/content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/";
which however has the following problem: This string returns a match which should otherwise not:
content/course/32/lesson/61/content
I'm thinking that it's got something to do with the word content repeating twice but I'm not entirely sure.
Any help is much appreciated.

The reason for the match is the alternation.
content\/?$
matches
content/course/32/lesson/61/content
To fix this, add a ^ (beginning of line) to the start of your regex to ensure the entire string is matched and not only the ending:
/^content(\/?|(\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4}))$/
See it in action

this works:
/(^content\/?|content\/course\/\d{1,4}\/lesson\/\d{1,4}\/content\/\d{1,4})$/

Related

Find all hashtags in string using preg_match_all

I'm having problems figuring out the right regex pattern for the search preg_match_all("THIS PART", $my_string). I need to find all hashtags in my string with the word after the hashtag included as well.
So, these strings should be found by the mentioned function:
Input
#hi im like typing text right here hihih #asdasdasdasd #
Result
#hi
#asasdasdasdasd
Input
#asd#asd xd so fun lol #lol
Result
#asd#asd2 would be two seperate matches and #lol would be matched aswell.
I hope the question made sense and thanks beforehand!
This should work:
/#(?<hash>[^\s#]+)/g
It searches for # and creates then a named group called hash, it stops matching after it reaches another # or after any whitespace character (\s).
You can use preg_match_all
preg_match_all('/(?<!\w)#\w+/', $your_string, $allMatches);
It will give all contain # tag word. hope it help you.
print_r($allMatches)

How do I extract one group from a URL using regex for use in a redirect?

I've read the Best RegEx Trick Ever and tried to wrap my head around the other answers here on Stack Exchange and just can't seem to get it right. Take these three strings:
http://www.test.com/newyork/class-schedule
http://www.test.com/location/newyork/class-schedule
http://www.test.com/location/newyork/training
I need a regex that will extract the newyork from the first string and save it for a replace later, but will NOT match any part of the other strings. Also, for obscure reasons, I can not include http://www.test.com as a condition for matching (so I can't use anything before the slash that precedes newyork). Note that in this scenario, newyork could easily be chicago, atlanta, or any other city name with no spaces or punctuation.
The only thing I've been able to figure out that isolates only newyork in the first string is the following:
/.*\.com\/(.[^\/]*)\/class-schedule/g
However, this relies on using the URL first which I can't use.
Any ideas on how to achieve this WITHOUT using the URL?
[EDIT]
To clarify what I'm looking for, I'm trying to take the results from the first string and add "location" to it, still using regex. So:
http://www.test.com/newyork/class-schedule
would become
http://www.test.com/location/newyork/class-schedule
using something like
http://www.test.com/location/$1/class-schedule
Try this: ~/(\w+)/[-a-z]+?/?(?:\?.*?)*(:?\s|$)~gm
See it working here: https://regex101.com/r/4VMazZ/3.
So it will use the end of URL instead of the beginning and match only the word between slash 2 and 3 from the end. There can be a query string it will still work.
[EDIT 1]
I exchanged 2 chars doing typo in the end so it was capturing one extra group: /(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$). here: https://regex101.com/r/4VMazZ/4
If you use preg_match($pattern, $string, $matches); the result you want (newyork) will be in $matches[1];, $matches[0] contains everything.
You can see the captures in 'MATCH INFORMATION' panel on regex101 in my example!
[EDIT 2] after your comment.
If you want to replace the whole url you have to match the whole URL, something like this: .*?/(\w+)/[-a-z]+?/?(?:\?.*?)*(?:\s|$) will do in this example. See it working here: https://regex101.com/r/4VMazZ/5
[EDIT 3] Add capturing of last part for replacement.
So as you want to reuse last part you need to add capturing parenthesis: .*?/(\w+)/([-a-z]+?)/?(?:\?.*?)*(?:\s|$).
See it working here: https://regex101.com/r/4VMazZ/6
Could this work? See it here.
(?<=location\/|\.\w{3}\/|\.\w{2}\/)(?!location).*?(?=\/|$)
It matches everything following .xxx/ or .xx/ or location/. I don't know if one letter domain exist, in this case, you can add |\.\w\/ to the lookahead at the start of the regex.
(?<=location\/|\.\w{3}\/|\.\w{2}\/) is a lookahead, so it matches the following pattern only if preceded by location/ or .xxx or .xx
.*? matches every character (lazy)
(?=\/|$) end match if next character is / or on line end
Note: If location is counted as part of the url, I don't think what you are asking is possible in regex, as the city name could be anywhere in string. If so, then you could have a list of cities and check what part of the url matches one of them.
EDIT: You need the multiline m flag so $ also matches end of line

match regex php between two string with string in middle

I would like to get a string made of one word with a delimiter word before and after it
i tried but doen t work
$stringData2 = file_get_contents('testtext3.txt');
$regular2=('/(?<=first del)*MAIN WORD(?=last del)*\s');
preg_match_all($regular2,
$stringData2,
$out, PREG_PATTERN_ORDER);
thank you very much for any help
No quantifier needed, add delimeter at end, put \s inside lookahead.
'/(?<=first del)MAIN WORD(?=last del\s)/'
This regex
(?<=xx)[^\s]*(?=yy)
matches hello in:
xxhelloyy
but fails to match in:
xxhello worldyy
This is probably what you're looking for.
If you want the delimiter string included in the match, then you should not be using lookahead or look or look behind. It should be something rather basic, like this.
/\s?first del MAIN WORD last del\s?/
If you do want to return JUST the MAIN WORD part of the match, then this will work.
/(?<=\s?first del)MAIN WORD(?=last del\s?)/
Put a 'i' at the very end of that to make it case insensitive, if you want. I only mention this, because in the example you gave me above has different case between the example text and the desired response.

Parse block with php regex

I'm trying to write a (I think) pretty simple RegEx with PHP but it's not working.
Basically I have a block defined like this:
%%%%blockname%%%%
stuff goes here
%%%%/blockname%%%%
I'm not any good at RegEx, but this is what I tried:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/i',$input,$matches);
It returns an array with 4 empty entries.
I guess it also, apart from actually working, needs some sort of pointer for the third match because it should be equal to the first one?
Please enlighten me :)
You need to allow the dot to match newlines, and to allow ^ and $ to match at the start and end of lines (not just the entire string):
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/(.*?)%%%%$/sm',$input,$matches);
The s (single-line) option makes the dot match any character including newlines.
The m (multi-line) option allows ^ and $ to match at the start and end of lines.
The i option is unnecessary in your regex since there are no case-sensitive characters in it.
Then, to answer the second part of your question: If blockname is the same in both cases, then you can make that explicit by using a backreference to the first capturing group:
preg_match_all('/^%%%%(.*?)%%%%(.*?)%%%%\/\1%%%%$/sm',$input,$matches);
I'm pretty sure you can't since these operations would need to save a variable and you can't in regex. You should try to do this using PHP's built-in token parser. http://php.net/manual/en/function.token-get-all.php

Rapidshare URL not matching correctly

I'm trying to make sure that a Rapidshare URL is valid when a user submits it through my form.
This is the regex that I've come up with so far:
http://rapidshare.com/files/[0-9]+/[a-zA-Z0-9\._-]+
A rapidshare link looks like this:
http://rapidshare.com/files/168501977/some_random-file.zip
My pattern matches, but not entirely correctly. For example, if we use this input:
http://rapidshare.com/files/168501977/some_random-file.zipĀ£%^$
It will still match using the PHP function preg_match(), and let it go through, even though there are illegal symbols on the end of the URL. I want the pattern to match the entire input, and not just a random length that matches.
Any help would be appreciated, cheers!
You need to anchor the regex pattern. Use ^ to anchor the beginning and $ to anchor the end. So the pattern becomes:
^http://rapidshare.com/files/[0-9]+/[a-zA-Z0-9\._-]+$
This prevents a partial match of the string like the example is generating.
Validate the start and the end of your string using ^ and $. Example:
^ht{2}p:\/{2}rapidshare\.com\/files\/\d+\/[\.a-zA-Z_-]+$

Categories