regex in url and xpath

regex in url and xpath - php

I have used xpath to crawl all href value in a ul li a.
foreach ($domExemple as $exemple) {
$result[$i++] = $exemple->nodeValue;
}
Where $exemple->nodeValue is a string like /produit/3017620424403/nutella
I want to retrieve all number between the two /
They have different length...
I tried this regex : /\/([0-9]{0,})/i
But it returns not the good thing...
Anyone to explain me and help me ?

In your pattern you have to add the forward slash at the end as well:
/\/([0-9]{0,})\//i
^^
You don't have to escape the forward slash if you change to another delimiter like for example ~ and {0,} can be written as * but would also match an empty string. You might update it to use a + instead to match 1+ times a digit.
$pattern = "~/([0-9]+)/~i";
Your value is in the first capturing group. Note that there is no start boundary so if there are multiple parts in the string with /digits/ then those will also be matched.
Regex demo
Another option could be to match both forward slashes from the start of the string and make use of \K to forget what was matched. Then match 1+ digits and assert what is on the right is a /
^/[^/]+/\K\d+(?=/)
Regex demo

Assuming the URL is in the format provided above, try this:
#(/[0-9]*/)#
# are chosen as delimiters so we won't have to escape the slashes and have messy code.
If you want just the numbers, use this:
#/([0-9]*)/#
The paranthesis will group what you are looking for.

Related

Sanitize phone number: regular expression match all except first occurence is on first position

regarding to this post "https://stackoverflow.com/questions/35413960/regular-expression-match-all-except-first-occurence" I'm wondering how to find the first occurence on a string only if it start's with a specfic character in PHP.
I would like to sanitize phonenumbers. Example bad phone number:
+49+12423#23492#aosd#+dasd
Regex to remove all "+" except first occurence.
\G(?:\A[^\+]*\+)?+[^\+]*\K\+
Problem: it should remove every "+" only if it starts with "+" not if the first occurence-position is greater than 1.
The regex to remove everything except numbers is easy:
[^0-9]*
But I don't know how to combine those two within one regex. I would just use preg_replace() twice.
Of course I would be able to use a workaround like if ($str[0] === '+') {...} but I prefer to learn some new stuff (regex :)
Thanks for helping.

You can use
(?:\G(?!\A)|^\+)[^+]*\K\+
See the regex demo. Details:
(?:\G(?!\A)|^\+) - either the end of the preceding successful match or a + at the start of string
[^+]* - zero or more chars other than +
\K - match reset operator discarding the text matched so far
\+ - a + char.
See the PHP demo:
$re = '/(?:\G(?!\A)|^\+)[^+]*\K\+/m';
$str = '+49+12423#23492#aosd#+dasd';
echo preg_replace($re, '', $str);
// => +4912423#23492#aosd#dasd

You seem to want to combine the two queries:
A regex to remove everything except numbers
A regex to remove all "+" except first occurence
Here is my two cents:
(?:^\+|\d)(*SKIP)(*F)|.
Replace what is matched with nothing. Here is an online demo
(?:^\+|\d) - A non-capture group to match a starting literal plus or any digit in the range from 0-9.
(*SKIP)(*F) - Consume the previous matched characters and fail them in the rest of the matching result.
| - Or:
. - Any single character other than newline.
I'd like to think that this is a slight adaptation of what some consider "The best regex trick ever" where one would first try to match what you don't want, then use an alternation to match what you do want. With the use of the backtracking control verbs (*SKIP)(*F) we reverse the logic. We first match what we do want, exclude it from the results and then match what we don't want.

PHP regex to get WordPress category slug from $_SERVER['REQUEST_URI']

I am trying to get the category slug from the $_SERVER['REQUEST_URI'] using a pre_match pattern, but it's not working.
For example, the $_SERVER['REQUEST_URI'] returns /category/current-affairs/ and I want to set current-affairs to a variable that I want to use.
So far I came up with this but it's not working
^\/category\/(?:\/(\w+))*$/g
Any help with this will be very much appreciated.

you don't need regex. wp has a function do it for you:
if(is_category()) {
$category = get_query_var('cat');
$current_cat = get_category($cat);
echo 'The slug is ' . $current_cat->slug;
}

Your regex ^\/category\/(?:\/(\w+))*$/g matches:
From the beginning of the string ^
Match a forward slash \/
Match category
Match a forward slash \/
A non capturing group (?: repeated zero or more times *
In this non capturing group, match a forward slash and in a capturing group \w one or more times\/(\w+)
The end of the string $
With this part \/(\w+) you are trying to match current-affairs
But this part matches
A forward slash \/
Capture in a group [A-Za-z0-9_] one or more times
But your text has a hyphen - in it.
The full pattern expect to match for example /category, 2 forward slashes // and [A-Za-z0-9_]+
It would match:
/category//currentaffairs
But not
/category//currentaffairs/
/category/currentaffairs/
/category//current-affairs/
/category//current-affairs
I think you can get your match like this:
^\/category\/([\w-]+)\/$

A very easy way is to explode the uri to an array and read the 3rd value:
$request_uri = explode('/', $_SERVER['REQUEST_URI']);
$category = $request_uri[2];

Your pattern isn't working for a few reasons:
You're trying to match // (in this section \/(?:\/)
\w doesn't match -. It matches a-zA-Z0-9_
You're always ensuring something follows the last /: (?:\/(\w+))*$
Code
Note: The regex below uses a different regular expression delimiter (in the link, for example, I use # instead of / to delimit the pattern). This allows us to use / inside the pattern without first having to escape it.
See regex in use here
/category/\K[^/]*
/category/ Match this literally.
\K Resets the starting point of the reported match. Any previously consumed characters are no longer included in the final match.
[^/]* Match any character except / any number of times.
Usage
$re = '#/category/\K[^/]*#';
preg_match_all($re, $_SERVER['REQUEST_URI'], $matches, PREG_SET_ORDER, 0);
var_dump($matches);

This should work:
\/category(.*)

RegEx expression to hit only words with a-z and no aumlats

Can you help me out with this one? I have a list of words like this:
sachbearbeiter/-in
referent/-in
anlagenführer/-in
it-projektleiter/-in
I want to select only:
sachbearbeiter/-in
referent/-in
This is my current regex: ([a-z]+)/-(in)
The problem is it hits all even the ones with - and with ü
Thank you in advance.

You can use anchors to match the word you want:
^([a-z]+)/-(in)$
^---- Here ----^
Working demo
Update: for your comment, if you want to accept aumlats you can use unicode flag with \w like this:
^(\w+)/-(in)$
Working demo

You need to specify beginning & end of string so that it can match exact chars
change your regex to
^([a-z]+)/-(in)$
^ -> stands for beginning of string
$-> for end of string

Your current regex i.e. ([a-z]+)/-(in) does escape the / character and also trying to look into substrings that matches the pattern, so it'll show each of them.
Regex should be : ^([a-z]+)\/-(in) i.e. it should start with only small case alphabets with escaped /

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.

A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.

You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.

In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.

(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)

You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)

try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex in url and xpath - php

Assuming the URL is in the format provided above, try this: #(/[0-9]/)# # are chosen as delimiters so we won't have to escape the slashes and have messy code. If you want just the numbers, use this: #/([0-9])/# The paranthesis will group what you are looking for.

Related

Sanitize phone number: regular expression match all except first occurence is on first position

PHP regex to get WordPress category slug from $_SERVER['REQUEST_URI']

RegEx expression to hit only words with a-z and no aumlats

(PHP) How to find words beginning with a pattern and replace all of them?

PHP RegEx get first letter after set of characters

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex in url and xpath - php

Assuming the URL is in the format provided above, try this: #(/[0-9]*/)# # are chosen as delimiters so we won't have to escape the slashes and have messy code. If you want just the numbers, use this: #/([0-9]*)/# The paranthesis will group what you are looking for.

Related

Sanitize phone number: regular expression match all except first occurence is on first position

PHP regex to get WordPress category slug from $_SERVER['REQUEST_URI']

RegEx expression to hit only words with a-z and no aumlats

(PHP) How to find words beginning with a pattern and replace all of them?

PHP RegEx get first letter after set of characters

Categories

Resources

Assuming the URL is in the format provided above, try this: #(/[0-9]/)# # are chosen as delimiters so we won't have to escape the slashes and have messy code. If you want just the numbers, use this: #/([0-9])/# The paranthesis will group what you are looking for.