Regular Expression (preg_match) - php

This is the not working code:
<?php
$matchWith = " http://videosite.com/ID123 ";
preg_match_all('/\S\/videosite\.com\/(\w+)\S/i', $matchWith, $matches);
foreach($matches[1] as $value)
{
print 'Hyperlink';
}
?>
What I want is that it should not display the link if it has a whitespace before or after.
So now it should display nothing. But it still displays the link.

This can also match ID12, because 3 is not an space, and the / of http:/ is not a space. You can try:
preg_match_all('/^\S*\/videosite\.com\/(\w+)\S*$/i', $matchWith, $matches);

So, you don't want it to display if there's whitespaces. Something like this should work, didn't test.
preg_match_all('/^\S+?videosite\.com\/(\w+)\S+?$/i', $matchWith, $matches);

You can try this. It works:
if (preg_match('%^\S*?/videosite\.com/(\w+)(?!\S+)$%i', $subject, $regs)) {
#$result = $regs[0];
}
But i am positive that after I post this, you will update your question :)
Explanation:
"
^ # Assert position at the beginning of the string
\S # Match a single character that is a “non-whitespace character”
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\/ # Match the character “/” literally
videosite # Match the characters “videosite” literally
\. # Match the character “.” literally
com # Match the characters “com” literally
\/ # Match the character “/” literally
( # Match the regular expression below and capture its match into backreference number 1
\w # Match a single character that is a “word character” (letters, digits, etc.)
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
\S # Match a single character that is a “non-whitespace character”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
"

It would probably be simpler to use this regex:
'/^http:\/\/videosite\.com\/(\w+)$/i'
I believe you are referring to the white space before http, and the white space after the directory. So, you should use the ^ character to indicate that the string must start with http, and use the $ character at the end to indicate that the string must end with a word character.

Related

Replace text in a url using PHP

So basically I got links like these
https://dog.example.com/randomgenerated45443444444444
https://turtle.example.com/randomgenerated45443
https://mice.example.com/randomgenerated452
https://monkey.example.com/randomgenerated43232323
https://leopard.example.com/randomgenerated22222222222222222
I was wondering if it was possible to detect the words between https:// and .example.com/ which would be the random animal name. And replace it with "thumbnail". The amount of letters in the animal names and randomgenerated ones always vary in amount of letters in them
You can use a positive lookahead to get to the data you want:
$string = 'https://leopard.example.com/randomgenerated22222222222222222';
$pattern = '/(?=.*\/\/)(.*?)(?=\.)/';
$replacement = 'thumbnail';
$foo = preg_replace($pattern, $replacement, $string);
$protocol = 'https://';
echo $protocol . $foo;
returns
https://thumbnail.example.com/randomgenerated22222222222222222
Explanation of the regex:
Positive Lookahead (?=.*\/\/)
Assert that the Regex below matches
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\/ matches the character / literally (case sensitive)
\/ matches the character / literally (case sensitive)
1st Capturing Group (.*?)
.*? matches any character (except for line terminators)
*? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Positive Lookahead (?=\.)
Assert that the Regex below matches
\. matches the character . literally (case sensitive)
Assuming that https:// and example.com never change, then this is the simplest regex you can use for the purpose:
https://(.+)\.example\.com
Anything in the (.+) will be the words you are attempting to extract.
Edit on 2016.10.27:
While the / character has no special meaning in Regular Expressions, it will likely need to be escaped (\/) if you are also using it as your expression delimiter. So the above will look like:
https:\/\/(.+)\.example\.com

PHP regex: each word must end with dot

Can someone help me how to specific pattern for preg_match function?
Every word in string must end with dot
First character of string must be [a-zA-Z]
After each dot there can be a space
There can't be two spaces next to each other
Last character must be a dot (logicaly after word)
Examples:
"Ing" -> false
"Ing." -> true
".Ing." -> false
"Xx Yy." -> false
"XX. YY." -> true
"XX.YY." -> true
Can you help me please how to test the string? My pattern is
/^(([a-zA-Z]+)(?! ) \.)+\.$/
I know it's wrong, but i can't figure out it. Thanks
Check how this fits your needs.
/^(?:[A-Z]+\. ?)+$/i
^ matches start
(?: opens a non-capture group for repetition
[A-Z]+ with i flag matches one or more alphas (lower & upper)
\. ? matches a literal dot followed by an optional space
)+ all this once or more until $ end
Here's a demo at regex101
If you want to disallow space at the end, add negative lookbehind: /^(?:[A-Z]+\. ?)+$(?<! )/i
Try this:
$string = "Ing
Ing.
.Ing.
Xx Yy.
XX. YY.
XX.YY.";
if (preg_match('/^([A-Za-z]{1,}\.[ ]{0,})*/m', $string)) {
// Successful match
} else {
// Match attempt failed
}
Result:
The Regex in detail:
^ Assert position at the beginning of a line (at beginning of the string or after a line break character)
( Match the regular expression below and capture its match into backreference number 1
[A-Za-z] Match a single character present in the list below
A character in the range between “A” and “Z”
A character in the range between “a” and “z”
{1,} Between one and unlimited times, as many times as possible, giving back as needed (greedy)
\. Match the character “.” literally
[ ] Match the character “ ”
{0,} Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

preg_match lookbehind after second slash

This is my string:
stringa/stringb/123456789,abc,cde
and after preg_match:
preg_match('/(?<=\/).*?(?=,)/',$array,$matches);
output is:
stringb/123456789
How can I change my preg_match to extract the string after second slash (or after last slash)?
Desired output:
123456789
You can match anything other than a / as
/(?<=\/)[^\/,]*(?=,)/
[^\/,]* Negated character class matches anything other than , or \
Regex Demo
Example
preg_match('/(?<=\/)[^\/,]*(?=,)/',$array,$matches);
// $matches[0]
// => 123456789
This should do it.
<?php
$array = 'stringa/stringb/123456789,abc,cde';
preg_match('~.*/(.*?),~',$array,$matches);
echo $matches[1];
?>
Disregard everything until the last forward slash (.*/). Once the last forward slash is found keep all the data until the first comma((.*?),).
You don't need to use lookbehind, i.e.:
$string = "stringa/stringb/123456789,abc,cde";
$string = preg_replace('%.*/(.*?),.*%', '$1', $string );
echo $string;
//123456789
Demo:
http://ideone.com/IxdNbZ
Regex Explanation:
.*/(.*?),.*
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Match the character “/” literally «/»
Match the regex below and capture its match into backreference number 1 «(.*?)»
Match any single character that is NOT a line break character «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “,” literally «,»
Match any single character that is NOT a line break character «.*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
$1
Insert the text that was last matched by capturing group number 1 «$1»

regular expression for links in string to exclude dot only at end of line

I created a regular expression that reads a string and transforms found url's into HTML links. I wanted to exclude the dot at the end of a line (containing the text link) but it also excludes the dot inside the text link (like in http://www.website.com/page.html.) The end dot here should be excluded but not the .html. This is my regex:
$text = preg_replace("#(^|[\n \"\'\(<;:,\*])((www|ftp)\.+[a-zA-Z0-9\-_]+\.[^ \"\'\t\n\r< \[\]\),>;:.\*]*)#", "\\1\\2", $text);
How would one do that?
Thanx! Tom
Change your RegEx to this
\b((?#protocol)https?|ftp)://((?#domain)[-A-Z0-9.]+)((?#file)/[-A-Z0-9+&##/%=~_|!:,.;]*)?((?#parameters)\?[A-Z0-9+&##/%=~_|!:,.;]*)?
or this
\b((?:https?|ftp|file)://[-A-Z0-9+&##/%?=~_|$!:,.;]*[A-Z0-9+&##/%=~_|$]*)\b
Explanation
"
\b # Assert position at a word boundary
( # Match the regular expression below and capture its match into backreference number 1
# Match either the regular expression below (attempting the next alternative only if this one fails)
http # Match the characters “http” literally
s # Match the character “s” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
| # Or match regular expression number 2 below (attempting the next alternative only if this one fails)
ftp # Match the characters “ftp” literally
| # Or match regular expression number 3 below (the entire group fails if this one fails to match)
file # Match the characters “file” literally
)
:// # Match the characters “://” literally
[-A-Z0-9+&##/%?=~_|\$!:,.;] # Match a single character present in the list below
# The character “-”
# A character in the range between “A” and “Z”
# A character in the range between “0” and “9”
# One of the characters “+&##/%?=~_|\$!:,.;”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
[A-Z0-9+&##/%=~_|\$] # Match a single character present in the list below
# A character in the range between “A” and “Z”
# A character in the range between “0” and “9”
# One of the characters “+&##/%=~_|\$”
"
Hope this helps.

Regex equivalence yield error

Are these two regexp equivalent, except that second don't capture dpthd text?
'/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)\s*(dpthd)/sU'
'/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)\s*dpthd/sU'
I just removed parenthesis.
The problem is that the first don't work, and the second works fine
EDIT>>>>
s (PCRE_DOTALL)
If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.
EDIT>>>>
Ok, I don't know what was a problem, I added m modifier and use preg_match_all now
preg_match_all('/<a name="dpthd_.*><\/a><a.*><\/a><a.*><\/a><h3.*>(.*)<\/h3>(.*)(?:<a name="dpthd_.*>|$)/sUm', $contents, $matches, PREG_OFFSET_CAPTURE)
Seems it's work, but I will place a test text later, because I want to know why it wasn't working in first version
Here you have a detailed view of the regexp to comapre :)
First Regexp
r"""
<a\ name="dpthd_ # Match the characters “<a name="dpthd_” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><h3 # Match the characters “a><h3” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
< # Match the character “<” literally
\/ # Match the character “/” literally
h3> # Match the characters “h3>” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
( # Match the regular expression below and capture its match into backreference number 3
dpthd # Match the characters “dpthd” literally
)
"""
Second Regexp
r"""
<a\ name="dpthd_ # Match the characters “<a name="dpthd_” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><a # Match the characters “a><a” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
>< # Match the characters “><” literally
\/ # Match the character “/” literally
a><h3 # Match the characters “a><h3” literally
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
( # Match the regular expression below and capture its match into backreference number 1
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
< # Match the character “<” literally
\/ # Match the character “/” literally
h3> # Match the characters “h3>” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
)
\s # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
dpthd # Match the characters “dpthd” literally
"""
HTH!

Categories