in need of a php preg_replace regex - php

I am writing a script that needs to download images related to a product ID array to an external website.
Here are the possible product ID combinations.
ABC1234AB
ABC1234AB-CD
ABC1234AB-CDE
ABC1234ABC
I need to be able to convert them to their URL equivalent on the manufacturer's website, which are (In the same order):
abc1234_ab
abc1234_ab_cd
abc1234_ab_cde
abc1234_abc
I am looking for a Regex to use with preg_replace that would do the trick.
Thanks in advance!

$output = strtolower(preg_replace('~\d\K(?=[A-Z])|-~', '_', $input));
\K removes that is matched on the left from the match result, so , the digit before the letter is not a part of the match and will not be replaced.
(?=...) is a lookahead assertion that checks if a letter if following, it isn't a part of the match result too and will not be replaced too.

I'm a noob in regular expressions but I`ll give it a shot.
Input: /([A-Z]+)\d+([A-Z]+)\-([A-Z]+)/
A-Z matches uppercase alpha characters
\d matches numbers
"+" is used to repeat
And in the replacement callback use strtolower on the matches and join them how you want :P

Related

RegEx expression to hit only words with a-z and no aumlats

Can you help me out with this one? I have a list of words like this:
sachbearbeiter/-in
referent/-in
anlagenführer/-in
it-projektleiter/-in
I want to select only:
sachbearbeiter/-in
referent/-in
This is my current regex: ([a-z]+)/-(in)
The problem is it hits all even the ones with - and with ü
Thank you in advance.
You can use anchors to match the word you want:
^([a-z]+)/-(in)$
^---- Here ----^
Working demo
Update: for your comment, if you want to accept aumlats you can use unicode flag with \w like this:
^(\w+)/-(in)$
Working demo
You need to specify beginning & end of string so that it can match exact chars
change your regex to
^([a-z]+)/-(in)$
^ -> stands for beginning of string
$-> for end of string
Your current regex i.e. ([a-z]+)/-(in) does escape the / character and also trying to look into substrings that matches the pattern, so it'll show each of them.
Regex should be : ^([a-z]+)\/-(in) i.e. it should start with only small case alphabets with escaped /

(PHP) How to find words beginning with a pattern and replace all of them?

I have a string. An example might be "Contact /u/someone on reddit, or visit /r/subreddit or /r/subreddit2"
I want to replace any instance of "/r/x" and "/u/x" with "[/r/x](http://reddit.com/r/x)" and "[/u/x](http://reddit.com/u/x)" basically.
So I'm not sure how to 1) find "/r/" and then expand that to the rest of the word (until there's a space), then 2) take that full "/r/x" and replace with my pattern, and most importantly 3) do this for all "/r/" and "/u/" matches in a single go...
The only way I know to do this would be to write a function to walk the string, character by character, until I found "/", then look for "r" and "/" to follow; then keep going until I found a space. That would give me the beginning and ending characters, so I could do a string replacement; then calculate the new end point, and continue walking the string.
This feels... dumb. I have a feeling there's a relatively simple way to do this, and I just don't know how to google to get all the relevant parts.
A simple preg_replace will do what you want.
Try:
$string = preg_replace('#(/(?:u|r)/[a-zA-Z0-9_-]+)#', '[\1](http://reddit.com\1)', $string);
Here is an example: http://ideone.com/dvz2zB
You should see if you can discover what characters are valid in a Reddit name or in a Reddit username and modify the [a-zA-Z0-9_-] charset accordingly.
You are looking for a regular expression.
A basic pattern starts out as a fixed string. /u/ or /r/ which would match those exactly. This can be simplified to match one or another with /(?:u|r)/ which would match the same as those two patterns. Next you would want to match everything from that point up to a space. You would use a negative character group [^ ] which will match any character that is not a space, and apply a modifier, *, to match as many characters as possible that match that group. /(?:u|r)/[^ ]*
You can take that pattern further and add a lookbehind, (?<= ) to ensure your match is preceded by a space so you're not matching a partial which results in (?<= )/(?:u|r)/[^ ]*. You wrap all of that to make a capturing group ((?<= )/(?:u|r)/[^ ]*). This will capture the contents within the parenthesis to allow for a replacement pattern. You can express your chosen replacement using the \1 reference to the first captured group as [\1](http://reddit.com\1).
In php you would pass the matching pattern, replacement pattern, and subject string to the preg_replace function.
In my opinion regex would be an overkill for such a simple operation. If you just want to replace instance of "/r/x" with "[r/x](http://reddit.com/r/x)" and "/u/x" with "[/u/x](http://reddit.com/u/x)" you should use str_replace although with preg_replace it'll lessen the code.
str_replace("/r/x","[/r/x](http://reddit.com/r/x)","whatever_string");
use regex for intricate search string and replace. you can also use http://www.jslab.dk/tools.regex.php regular expression generator if you have something complex to capture in the string.

PHP RegEx get first letter after set of characters

I have some text with heading string and set of letters.
I need to get first one-digit number after set of string characters.
Example text:
ABC105001
ABC205001
ABC305001
ABCD105001
ABCD205001
ABCD305001
My RegEx:
^(\D*)(\d{1})(?=\d*$)
Link: http://www.regexr.com/390gv
As you cans see, RegEx works ok, but it captures first groups in results also. I need to get only this integer and when I try to put ?= in first group like this: ^(?=\D*)(\d{1})(?=\d*$) , Regex doesn't work.
Any ideas?
Thanks in advance.
(?=..) is a lookahead that means followed by and checks the string on the right of the current position.
(?<=...) is a lookbehind that means preceded by and checks the string on the left of the current position.
What is interesting with these two features, is the fact that contents matched inside them are not parts of the whole match result. The only problem is that a lookbehind can't match variable length content.
A way to avoid the problem is to use the \K feature that remove all on the left from match result:
^[A-Z]+\K\d(?=\d*$)
You're trying to use a positive lookahead when really you want to use non-capturing groups.
The one match you want will work with this regex:
^(?:\D*\d{1})(\d*)$
The (?: string will start a non-capturing group. This will not come back in matches.
So, if you used preg_match(';^(?:\D*\d{1})(\d*)$;', $string, $matches) to find your match, $matches[1] would be the string for which you're looking. (This is because $matches[0] will always be the full match from preg_match.)
try:
^(?:\D*)(\d{1})(?=\d*$) // (?: is the beginning of a no capture group

How to match 2nd instance in regex

get_by_my_column
If I only want to match the get_by portion of the above string, how can I do this? I keep reading on this regex cheatsheet that I should use \n but I can't figure out how to implement it properly...
I've tried variations of the following...
/((_){2})/
/(_+){2}/
/(\w+?_\w+?)_\w+/ (use non greedy quantifiers, your substring should be in capture group 1)
or just /\w+?_\w+?/ <---(edit: won't work, you do need that second underscore as regex structure to force the non greedy \w up to it :])
Do you need to use a regex for this? You could use explode() and just grab the first two elements of the resulting array.
Try
preg_match('/(^[a-z]+[_][a-z]+)/', $string, $results);
This matches a string that starts with a group of letters followed by an underscore followed by another set of letters.
Edit: (lowercase letters)
try /^get_by. ^ for the condition that g must be the starting character.

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Categories