How to use preg_match to extract data? - php

I am pretty new to the use of preg_match. Searched a lot for an answer before posting this question. Found a lot of posts to get data based on youtube ID etc. But nothing as per my needs. If its silly question, please forgive me.
I need to get the ID from a string with preg_match. the string is in the format
[#1234] Subject
How can I extract only "1234" from the string?

One solution is:
\[#(\d+)\]
This matches the left square bracket and pound sign [#, then captures one or more digits, then the closing right square bracket ].
You would use it like:
preg_match( '/\[#(\d+)\]/', '[#1234] Subject', $matches);
echo $matches[1]; // 1234
You can see it working in this demo.

You can try this:
preg_match('~(?<=\[#)\d+(?=])~', $txt, $match);
(?<=..) is a lookbehind (only a check)
(?=..) is a lookahead

Your regular expression:
preg_match('/^\[\#([0-9]+)\].+/i', $string, $array);

That's a way you could do it:
<?php
$subject = "[#1234] Subject";
$pattern = '/^\[\#([0-9]+)/';
preg_match($pattern, $subject, $matches);
echo $matches[1]; // 1234
?>

To get only the integer you can use subpatterns http://php.net/manual/en/regexp.reference.subpatterns.php
$string="[#1234] Subject";
$pattern="/\[#(?P<my_id>\d+)](.*?)/s";
preg_match($pattern,$string,$match);
echo $match['my_id'];

Related

PHP exploding url from text, possible?

i need to explode youtube url from this line:
[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]
It is possible? I need to delete [embed] & [/embed].
preg_match is what you need.
<?php
$str = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
preg_match("/\[embed\](.*)\[\/embed\]/", $str, $matches);
echo $matches[1]; //https://www.youtube.com/watch?v=L3HQMbQAWRc
$string = '[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]';
$string = str_replace(['[embed]', '[/embed]'], '', $string);
See str_replace
why not use str_replace? :) Quick & Easy
http://php.net/manual/de/function.str-replace.php
Just for good measure, you can also use positive lookbehind's and lookahead's in your regular expressions:
(?<=\[embed\])(.*)(?=\[\/embed\])
You'd use it like this:
$string = "[embed]https://www.youtube.com/watch?v=L3HQMbQAWRc[/embed]";
$pattern = '/(?<=\[embed\])(.*)(?=\[\/embed\])/';
preg_match($pattern, $string, $matches);
echo $match[1];
Here is an explanation of the regex:
(?<=\[embed\]) is a Positive Lookbehind - matches something that follows something else.
(.*) is a Capturing Group - . matches any character (except a newline) with the Quantifier: * which provides matches between zero and unlimited times, as many times as possible. This is what is matched between the groups prior to and after. This are the droids you're looking for.
(?=\[\/embed\]) is a Positive Lookahead - matches things that come before it.

Regex take everything after word and before character in PHP

I'm trying to get regex to work to take everything after "test" and before "#" in an email so "test-12345#example.com would become 12345.
I've got this far to get it to return everything before the "#" symbol. (Working in PHP)
!(\d+)#!
Either you can use capturing groups and use the regex
test-(\d+)#
and use $1 or use lookaheads and behinds like (?<=test-)\d+(?=#) which will just match 12345
(?<=test-)[^#]+
You can try this.No need to use groups.See demo.
https://regex101.com/r/eZ0yP4/28
You want everything between test and # so don't use \d.
$myRegexPattern = '#test([^#])*##Ui';
preg_match ($myRegexPattern, $input, $matches);
$whatYouNeed = $matches[1];
Try this
$input = 'test-12345#example.com';
$regexPattern = '/^test(.*?)\#/';
preg_match ($regexPattern, $input, $matches);
$whatYouNeed = $matches[1];
var_dump($whatYouNeed);

regex function[filename] pattern and function[string_with_escaped_characters] pattern

I'm trying to script and parse a file,
Please help with regex in php to find and replace the following patterns:
From: "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]"
To: "This is a bar_txt_content within a bar2_txt_content"
Something along those lines:
$subject = "This is a foo[/www/bar.txt] within a foo[/etc/bar.txt]";
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo['.$match[0].']', file_get_contents($match[0]), $subject);
}
And my second request is to have:
From: 'This is a foo2[bar bar ] bar bar].'
To: "this is a returned"
Something along those lines:
$subject = 'This is a foo2[bar bar \] bar bar].';
$pattern = '/regex-needed/';
preg_match($pattern, $subject, $matches);
foreach($matches as $match) {
$subject = str_replace('foo2['.$match[0].']', my_function($match[0]), $subject);
}
Please help in constructing these patterns...
If you always have a structure like foo[ ... ]
Then is very easy:
foo\[([^]]+)\]
That is .NET syntax but i'm sure the expressions is simple enough for you to convert.
Description of the regex:
Match the characters “foo” literally «foo»
Match the character “[” literally «[»
Match the regular expression below and capture its match into backreference number 1 «([^]]+)»
Match any character that is NOT a “]” «[^]]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “]” literally «]»
Luc,
this should help you get started.
http://php.net/manual/en/function.preg-replace.php
You may have to setup a loop and increase the counter, using preg_replace with a limit of 1 to replace only the first instance.
In order to match foo[/www/bar.txt]:
the regex should be something like:
foo\[\/www\/([A-Za-z0-9]*)\.txt\]
The backslashes are there to cancel the special meaning of some characters in your regexp.
It will match foo[/www/.[some file name].txt, and ${1} will contain the filename without the .txt as brackets form groups which can be used in the replaced expression. ${1} will contain what was matched in the first round brackets, ${2} will contain what was matched in the second one, etc ...
Therefore your replaced expression should be something like "${1}_txt_content". Or in the second iteration "${1}2_txt_content".
[A-Za-z0-9]* means any alphanumeric character 0 or more times, you may want to replace the * with a + if you want at least 1 character.
So try:
$pattern = foo\[\/www\/([A-Za-z0-9]*)\.txt\];
$replace = "${1}_txt_content";
$total_count = 1;
do {
echo preg_replace($pattern, $replace, $subject, 1, $count);
$replace = "${1}" + ++$total_count + "_txt_content";
} while ($count != 0);
(warning, this is my first ever PHP program, so it may have mistakes as I cannot test it ! but I hope you get the idea)
Hope that helps !
Tony
PS: I am not a PHP programmer but I know this works in C#, for example, and looking at the PHP documentation it seems that it should work.
PS2: I always keep this website bookmarked for reference when I need it: http://www.regular-expressions.info/
$pattern = '/\[([^\]]+)\]/';
preg_match_all($pattern, $subject, $matches);
print_r($matches['1']);
found the correct regex I needed for escaping:
'/foo\[[^\[]*[^\\\]\]/'

preg_match_all ignore words

I try to create a regex to capture emails ending not .info/.con containing no aaa/bbb.
Is this the correct syntax?
Eg: // search email ending in .com/.info containing no aaa/bbb
preg_match_all('#((?=.*#.*(?:com|info))(!.*(?:aaa|bbb)).*)#ui', $html, $emails);
To get this:
caaac#ccc.com = no
ccc#ccbbb.com = no
cccc#cccc.com = good (address syntax correct + term absent before or after the #)
Thank you for your reply.
This syntax works fine SEE HERE (thank you to STEMA) except for a string that includes spaces.
e.g:
$string = "email1#address.com blah email2#aaaaess.com blah email3#address.info embbbil4#adress.com";
preg_match_all("#^(?!.*aaa)(?!.*bbb).*#.*\.(?:com|info)$#im", $string, $matches);
Cordially
Simply use a positive expression and check that it did not match anything.
if (preg_match(...) == 0)
Also, there is no need to use preg_match_all if you are just interested whether a pattern matched or not.
If I understand your requirements right, then this would be the regex you can use together with #Tomalak answer.
preg_match('#.*#.*(?:aaa|bbb)|\.(?:com|info)$#ui', $html, $emails);
This pattern matches the stuff you don't want.
.*#.*(?:aaa|bbb) matches aaa or bbb after the #
the \.(?:com|info)$ is the other part, this matches if your email address ends with .com or .info
You can see it online here on Regexr
Update:
.*(?:aaa|bbb).*\.(?:com|info)$
This will match aaa or bbb and the string has to end with .com or .info
See it online here on Regexr
Here's the solution:
#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im
Function:
function get_emails($str){
preg_match_all('#(?<=^|\s)(?![\w#]*(?:aaa|bbb|(?:[0-9].*){3,}))[a-z0-9-_.]*#[a-z0-9-_.]*\.(?:com|net|org|info|biz)(?=\s|$)#im', $str, $output);
if(is_array($output[0]) && count($output[0])>0) {
return array_unique($output[0]);
}
}
Cordially

split email from string with PHP

I need to be able to split a string that contains email's From information. From the string I need to extract $NAME and $EMAIL or whatever is available.
The string can be in the following formats:
"Santa Clause" <santa#example.com>
Santa Clause <santa#example.com>
<santa#example.com>
preg_match('#(?:"(?<name>[^"]+)"|(?<name>.+))?<(?<email>.+)>#U', $string, $matches);
echo var_dump($matches);
preg_match('#(?:"(?<name>[^"]+)"|(?<name>.+))?<(?<email>[^>]+)>#U', $string, $matches);
echo var_dump($matches);
Try one of the above. The former will allow more valid emails, whereas the latter is faster.
$string_to_check = '"Santa Clause" <santa#npole.com>'
$matches = array();
preg_match('/?([^<"]*)"?\s*<(\S*)>/',$string_to_check,$matches);
$matches[1] //=> Santa Claus
$matches[2] //=> santa#npole.com
If the separator is always the same character (e.g. the semicolon):
$items = explode($separator, $from);
Otherwise, browse around in the preg_XXX functions for regex-based string splitting.
For the mail adress, have a look at http://php.net/manual/en/function.preg-match.php. This is a function that matches a string against a regular expression. Here's a short intro into how to use regular expressions with PHP.
If you want to match the name also, it will be some effort, so I suggest you first develop a regular expression that can extract an email address out of your string and then augment it to find the name also.
Found this and it works great!
$parts = preg_split('/[\'"<>]( *[\'"<>])*/', $text, -1, PREG_SPLIT_NO_EMPTY);

Categories