Regular expression help needed

Regular expression help needed - php

Although I can find a lot of tutorials on regular expressions, it remains above my grasp. The regular expression that I want to create is simple (judged by what I see in some of the examples), but I simply can not figure it out.
I want to do a simple replacement as follows:
I have image metadata saved in a MySQL table, with fields: id, name, title and alt.
In my content, I want to write [[IMAGE:1:right]]content here[[image:2:left]].
I want to get the matches of the ID (the digit) and the float (left or right) and replace the entire string with the image floated left or right, retrieved by the ID from the database table.
Here is my attempt:
preg_match("/^\[\[image:(\d+):(left|right)\]\]+/i", "[[IMAGE:1:right]]content here[[image:2:left]]", $matches);
This gives me the return of:
Array ( [0] => [[IMAGE:1:right]] [1] => 1 [2] => right )
So, it finds one, but I want it to find ALL of them, as I may have more than one image in a post. As far as I can tell, the + there should match all entries, and the i should match case insensitive. It appears as if the case insensitive way works, but I get only one return.
Could someone please let me know what I am doing wrong?

That's not quite how it works. That + only applies to the token immediately before it - the ]. You want to make the match global in Perl vernacular, which for PHP (which I think you're using?) means calling the function preg_match_all(). You'll also have to remove the ^, as only one of the images occurs at the beginning of the string.
Also, [ and ] are special characters in regex - so please escape them when you want a literal bracket by writing \[\[ and \]\].

Related

How would I replace a word in a string that I know the start and ending to, but not the entire word? Ie: Converting an ID# to a name

I am creating a web interface for a Discord bot I have created. I currently store all user accounts, messages, etc in a SQL database so that the web interface can have extensive logs for the mods to use. I am currently trying to come up with a solution for when viewing messages to convert "Discord Mentions" to readable names.
For example, when someone tags/mentions another user in a message, instead of the SQL storing '#name' it stores '<#!12345678>'. Based on how that text starts with <#! I know that it's linking a user name, in which I can access the SQL table containing all the users to retrieve their plain text name, but I'm not sure how to:
A) Specifically grab any words that both start with <#! and end with > to be able to grab the ID for a query and
B) Replace the the above <#!12345etc>, which is easy enough to do once I know how to do A.
Just for clarification I'm not looking for help doing SQL query, just looking for help in getting the entire word that stats with <#! and ends with > from a string/paragraph.
I'm terrible with regex so hopefully there is a solution that can work without needing it haha. Any tips you could provide would be greatly appreciated.
TLDR:
Sample string:
"Hey <#!123456789> thanks for that, I'll get back to you sooon."
How to get the grab the entire word that starts with <#! and ends with > to be able to do SQL query with it and then a replace() later.
I thought about exploding the string with a space and then going through each word one at a time checking each word with startswith and endswith but if the message author didn't leave a space between mentions and the rest of the text that wouldn't work.

If I'm understanding this correctly you want all the values between "<#!" and ">". That being said I believe all you need is this /<#!(.+)>/g
demo

You can do it this way:
<?php
$str = "Hey <#!123456789> thanks for that, I'll get back to you sooon.";
$re = '/(?<=<#!).+?(?=>)/m';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// flatten the array result (otherwise it's an array of arrays)
$matches = array_merge(...$matches);
// Print the entire match result
print_r($matches); //Array ( [0] => 123456789 )
Demo https://3v4l.org/D64Ma
Regex explanation:
(?<=<#!) looksbehind to find <#!. This starts the match
.+? matches any character any number of times until next lookahead.
(?=>) match ends when > is found (but not included in match)
The difference between using lookaheads and lookbehinds and regular /<#!(.+?)> is the matches array that they produce.
Lookarounds are not included in the matching group and results in an array of arrays containing all the matching groups ("12345678") only.
Not wrapping the start <#! and end > in a lookaround results in an array of arrays containing both the regex pattern match ("<#!12345678>", plus matching group ("12345678"). So you would have to extract the matching groups from the resulting arrays.

Regular Expressions (without "?") to match line that doesn't contain specific string (error #1139)

I need: a regexp for MySQL (php, PDO) usage. Regexp should find all numbers between brackets [ ], except number 150.
So I would like to get:
[3]
[25464]
[510]
But I would like to exclude:
[150]
What I got:
\[{1}((?!150)[0-9])+\]{1}
and it works fine for newest version of MySQL, but I need something that would work also on an older version (probably 5.1).
Problem: Currently I get an error:
1139 - Got error 'repetition-operator operand invalid' from regexp
I know I can't use ?. How can I replace it?
Additional info (edit):
I'm redesigning the database and that's why I need to write this
II edit - why I need this:
I need to retrieve all rows which in column "content" contains only one specified [150]. One column 'content' can contain zero [nr] or one specific [nr] or many different [nrs].
WHERE content REGEXP '\[{1}((?!150)[0-9])+\]{1}' = 0 AND content LIKE '%[150]%'

Try this one, Hope this works.
$string='[3]
[25464]
[510]
[150]
[100]';
preg_match_all('/\[(?!150)\d+\]/', $string,$matches);
print_r($matches);
This will match all digits(except 150) with brackets.
$matches[0] will contain desired result...

WHERE x REGEXP '\[[[:digit:]]+\]'
AND NOT x REGEXP '\[150\]'
However, that will reject aaa[123]bbb[150]ccc. Should it be rejected? (Please give some sample data that be matched / rejected. Your goal is not crystal clear.)
Since LIKE is faster than REGEXP, this will be a little faster:
WHERE x REGEXP '\[[[:digit:]]+\]'
AND x NOT LIKE '%[150]%'

I have list of webpage URLs, I just need to strip everything except specific value and ID from it using regex

Suppose I have list of URLs that follow structure below. I need to strip each one out so all thats left is the abcustomerid=12345. How can I do this using regex with notepad ++?
Here's an example of the different variety in each line. I just need to remove everything from each line, but leave the abcustomerid=12345 or whatever value that follows abcustomerid.
/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525
Each line could have anything different around the abcustomerid, but i just need to remove everything and keep the abcustomerid and the value.

This regex should do it.
(?:&|\?)abcustomerid=(\d+)
Usage:
<?php
$string= '/the/stucture/blah.php?timeout=300&abcustomerid=53122&customer=zxyi
/some/other/struct/pagehere.php?today=Thursday&abcustomerid=241&count=54
/blah/blah/tendid.php?abcustomerid=12525';
preg_match_all('~(?:&|\?)abcustomerid=(\d+)~', $string, $output);
print_r($output[1]);
The ?: tells the regex not to capture that group. We don't want to capture that data because it is irrelevant. The () capture the data we are interested in. The \d+ is one or more numbers (the + is the one or more part of it). If it can be any value change that to .+? which will match anything but then you will need an anchor for where it should stop. I'd use (?:&|$), which tells it to capture until the next & or the end of the string if it is multilined you'll need to use the m modifier. http://php.net/manual/en/reference.pcre.pattern.modifiers.php
Output:
Array
(
[0] => 53122
[1] => 241
[2] => 12525
)
Demo:
http://sandbox.onlinephpfunctions.com/code/37a4ddea8c50f98a41ac7d45fec98f5f1f58761f

Here is the RegEx which takes the abcustomerid with its value.
[?&](abcustomerid=\d+)
However, how you are going to 'remove everything' using Notepad++?
You can use this service to do this (there is demo in the end of the answer).
Copy your regex and all your data into Test string form. After it succesfully matches everything, look at Match information window at the middle right of the page. Click Export matches... button and choose plain text.
You will get something like this:
abcustomerid=53122
abcustomerid=241
abcustomerid=12525
Here is the working Demo.

regex : match two different parts of same string

I've got the following string:
{!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
I would like to match/extract track_created_f and NOW/DAY-3MONTHS/DAY TO NOW/DAY. The {!ex=track_created_f} might or might not be present at all times, so the regex should not rely on this part.
However, it is the second track_created_f (and not the track_created_f which is a part of !ex=track_created_f) which I need to match.
What I've got so far is the following (see this link for live preview):
[^.*(\w+)\:\[(.*)?\]$]
However, this just gives me :
Array
(
[0] => {!ex=track_created_f}track_created_f:[NOW/DAY-3MONTHS/DAY TO NOW/DAY]
[2] => f
[2] => NOW/DAY-3MONTHS/DAY TO NOW/DAY
)
What I'm having trouble to get a real grip on is how I can use regex to match only the part(s) of the string which I'd like to match, and only return that part. As it is now, (0) the entire string is being returned along with (1) the not so good match of track_created_f and (2) the match of NOW/DAY-3MONTHS/DAY TO NOW/DAY.
I've been trying to figure this one out by reading the docs, but I'm uncertain as to whether I'm getting things right - particularly the optional '?' clauses I've put in. Is that the right way to match subsets of strings at all?

[^.*(\w+)\:\[(.*)?\]$] is a wrong regex. You are actually putting whole regex inside a regex character class.
The following regex is enough
/(\w+):\[([^\]]+)/

^(?:{\!ex=\w+}|)(.*):\[(.*)?\]$
That will make the {!ex=track_created_f} part optional.
See: http://www.phpliveregex.com/p/1gc

RegEx with character set inside positive lookbehind, Is it possible?

I need to match "name" only after "listing", but of course those words could be any url directory or page.
mydomain.com/listing/name
so the only thing I can "REGuest" (request) is to be some parent directory there.
In other words, I want to match the "position" i.e. whatever comes 2nd after the domain.
I'm trying something like
(?<=mydomain\.com/[^/\?&]+/)[^/\?&]+(?:/)?
But the character set won't work inside the positive lookbehind, at least it's setup to match only ONE character. As soon as I try to match other than one (e.g. modify it with +, ? or *) it just stops working.
I'm obviously missing the positive lookbehind syntax and it seems not intended for what I'm trying.
How can I match that 2nd level filename?
Thanks.

Regular-expressions.info states that
The bad news is that most regex flavors do not allow you to use just
any regex inside a lookbehind, because they cannot apply a regular
expression backwards. Therefore, the regular expression engine needs
to be able to figure out how many steps to step back before checking
the lookbehind...
(Read further, they even mention Perl, Python and Java.)
I think the quantifier might be the problem. I found this on stackoverflow and briefly flew over it.
Wouldn't it be possible to just match the whole path, and use a group for the second level filename:
mydomain\.com\/[^\/\?&]+\/([^\/\?&]+)(?:\/)?
(note: I had to escape the / for my tests...)
The result of this would be something like:
Array
(
[0] => mydomain.com/listing/name
[1] => name
)
Now, because I don't know the context of your problem, I just assumed you would be able to postprocess the results and get the group 1 (index 1) from the result. If not, I unfortunately don't know...

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.