I want to check my link in a website, but I also want to check is it visible. I wrote this code:
$content = file_get_contents('tmp/test.html');
$pattern = '/<a\shref="http:\/\/mywebsite.com(.*)">(.*)<\/a>/siU';
$matches = [];
if(preg_match($pattern, $content, $matches)) {
$link = $matches[0];
$displayPattern = '/display(.?):(.?)none/si';
if(preg_match($displayPattern, $link)) {
echo 'not visible';
} else {
echo 'visible';
}
} else {
echo 'not found the link';
}
It works, but not perfect. If the link is like this:
<a class="sg" href="http://mywebsite.com">mywebsite.com</a>
the fist pattern won't work, but if I change the \s to (.*) it gives back string from the first a tag. The second problem is the two pattern. Is there any way to merge the first with negation of the second? The merged pattern has 2 results: visible or not found/invisible.
I'll try to guess.
You are having a problem if your code(one that you fetch with file_get_contents) looks like this
<a class="sg" href="http://mywebsite.com">mywebsite.com</a>
.
.
.
mywebsite.com
Your regex will return everything from first </a> tag because dot matches a new line(I guess you need it turned on, but if you dont, its 's' flag, so remove it)
Therefore
.*
will keep searching everything, so you need to make it greedy
(when its greedy it will stop searching once it finds what its looking for), like this
.*?
Your regex should look like this then
<a.*?href="http:\/\/mywebsite.com(.*?)">(.*?)<\/a>
Related
I have the following title formation on my website:
It's no use going back to yesterday, because at that time I was... Lewis Carroll
Always is: The phrase… (author).
I want to delete everything after the ellipsis (…), leaving only the sentence as the title. I thought of creating a function in php that would take the parts of the titles, throw them in an array and then I would work each part, identifying the only pattern I have in the title, which is the ellipsis… and then delete everything. But when I do that, in the X space of my array, it returns the following:
was...
In position 8 of the array comes the word and the ellipsis and I don't know how to find a pattern to delete the author of the title, my pattern was the ellipsis. Any idea?
<?php
$a = get_the_title(155571);
$search = '... ';
if(preg_match("/{$search}/i", $a)) {
echo 'true';
}
?>
I tried with the code above and found the ellipsis, but I needed to bring it into an array to delete the part I need. I tried something like this:
<?php
define('WP_USE_THEMES', false);
require('./wp-blog-header.php');
global $wpdb;
$title_array = explode(' ', get_the_title(155571));
$search = '... ';
if (array_key_exists("/{$search}/i",$title_array)) {
echo "true";
}
?>
I started doing it this way, but it doesn't work, any ideas?
Thanks,
If you use regex you need to escape the string as preg_quote() would do, because a dot belongs to the pattern.
But in your simple case, I would not use a regex and just search for the three dots from the end of the string.
Note: When the elipsis come from the browser, there's no way to detect in PHP.
$title = 'The phrase... (author).';
echo getPlainTitle($title);
function getPlainTitle(string $title) {
$rpos = strrpos($title, '...');
return ($rpos === false) ? $title : substr($title, 0, $rpos);
}
will output
The phrase
First of all, since you're working with regular expressions, you need to remember that . has a special meaning there: it means "any character". So /... / just means "any three characters followed by a space", which isn't what you want. To match a literal . you need to escape it as \.
Secondly, rather than searching or splitting, you could achieve what you want by replacing part of the string. For instance, you could find everything after the ellipsis, and replace it with an empty string. To do that you want a pattern of "dot dot dot followed by anything", where "anything" is spelled .*, so \.\.\..*
$title = preg_replace('/\.\.\..*/', '', $title);
After doing some research I can't seem to find a solution to my problem. I have a list of bad words and I want to be able to see if a user left a comment with any of those words. I have tried different regular expressions with no success. BTW Im no regex guru.
Lets say I have the word $word = 'bi' on my list. And a comment that says: $comment = he is bi, using preg_match($pattern, $comment) where parent has been: 1)$word#i
2)/\s+($word)/\s+/i
3)/\b($word)/\b/i
With this code:
if (preg_match($pattern, $commentdata['comment_content'])) {
echo 'spam';
}
else {
echo 'true'
}
I get:
1)spamthis is also the case for words linke combination which I dont want to block
2)true
3)true
How can I make a pattern that only matches the word and not the word within?
this do the job, you was near the solution:
preg_match("~\b$word\b~i", $comment);
For some particular cases like 'bi-directional' :
you can use instead:
preg_match("~(?<![a-z]-)\b$word\b(?!-[a-z])~i", $comment);
$pattern = "/\b{$word}/\b/i" ;
or
$pattern = "/(\b{$word}\b)/i" ;
Will do the work.
I have a regex that validate a specific url but it not really working. I want to validate urls like this -----> https : // example.co.nz/#![RANDOM_KEYS_HERE].
I want to do it only with https. Most importantly, the input of the user need to match https : // example.co.nz/#! but after the #!, the user can put anything he like.
Here is the code:
I know that the code is fked up xD I have a basic knowledge in that lol
#^https://example+\.[co\.nz][a-z0-9-_.]+\.[a-z]{2,4}#i
If anyone could help me to do it, it would be great! thanks!
Erm... not even close. Your regex reads as follows:
Starting from the beginning of the string...
Match literally https://exampl
Match one or more e
Match a literal .
Match one of any of these: cnoz.
Match one or more of these: a-z0-9-_.
Match a literal .
Match between 2 and 4 letters
This is nothing like what you're looking for. After all, I don't think you want this to pass:
https://exampleeeeeeeeeeee.complete.and.total.failure.-_-.lol
Instead, try this:
(^https://example\.co\.nz/#!(.*))
This regex reads as follows:
Starting from the beginning of the string...
Match literally https://example.co.nz/#!
Capture everything thereafter
Try this out:
^https:\/\/example\.co\.nz\/\#\!(.*)$
The parentheses at the end will do a sub-expression match which should allow you to pull out the ID.
if (preg_match('/^https:\/\/example\.co\.nz\/\#\!(.*)$/', $searchString, $matches)) {
$id = $matches[1];
}
if (preg_match('%^https://example\.co\.nz/#!(.+)$%i', $subject)) {
# Successful match
} else {
# Match attempt failed
}
Or you can get your [RANDOM_KEYS_HERE] part with this one
if (preg_match('%^https://example\.co\.nz/#!(.+)$%i', $subject, $regs)) {
$result = $regs[0];
} else {
$result = "";
}
You don't need regexp there. You just need to find out if string starts with some substring. Check this:
if(strpos($url, 'https://example.co.nz/#!')===0)
{
echo 'url is OK';
}
else
{
echo 'url is wrong';
}
http://www.php.net/manual/en/function.strpos.php
I hope this helps.
I want a regex solution to allow only
http://www.imdb.com/title/ttANYNumberWOrdetc/ links
Otherwise SHOW us error.. Incorrect link
I am not too good with regex
I just create this petren ..
preg_match('/http:\/\/www.imdb.com\/title\/(.*)\//is', 'http://www.imdb.com/title/tt0087469/', $result);
Its show me corect result but i think i missed some thing..
Thanks,
How about something like this: http://(?:www\.)?imdb.com/title/tt[^/]+/.
Example:
<?php
if ( preg_match('#^http://(?:www\.)?imdb\.com/title/tt[^/]+/$#', 'http://www.imdb.com/title/tt0448303/') )
echo 'Matches' . PHP_EOL;
Explanation:
The regular expression matches a string that starts with http:// followed either by imdb.com or www.imdb.com, then /title/tt followed by any character except for a / and that ends with a /.
The # is the delimiter, the ^ indicated the beginning of the string and the $ the end.
This should work:
if (preg_match("#^(http://www.|https://www.)imdb.com/title/tt([a-zA-Z0-9]+)(?:/)(?:^[a-zA-Z0-9]+)?$#s", 'http://www.imdb.com/title/tt0364845/', $matches)) {
echo 'yay';
} else {
echo 'nay';
}
This is my code:
$string = '« PreviousNext »';
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
$string = preg_replace('#(<a).*?(nextlink)#s', '', $string);
echo $string;
I am trying to remove the last link:
Next »';
My current output:
">Next »</a>
It removes everything from the start.
I want it to remove only the one with strpos, is this possible with preg_replace and how?
Thanks.
quite a tricky question to solve
first off,
the .*? will not match like you are expecting it to.
its starts from the left finds the first match for <a, then searches until it finds nextlink, which is essentially picking up the entire string.
for that regex to work as you wanted, it would need to match from the righthand side first and work backwards through the string, finding the smallest (non-greedy) match
i couldn't see any modifiers that would do this
so i opted for a callback on each link, that will check and remove any link with nextlink in it
<?php
$string = '« PreviousNext »';
echo "RAW: $string\r\n\r\n";
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
echo "SRC: $string\r\n\r\n";
$string = preg_replace_callback(
'#<\;a.+?</a>#',
'remove_nextlink',
$string
);
function remove_nextlink($matches) {
// if you want to see each line as it works, uncomment this
// echo "L: $matches[0]\r\n\r\n";
if (strpos($matches[0], 'nextlink') === FALSE) {
return $matches[0]; // doesn't contain nextlink, put original string back
} else {
return ''; // contains nextlink, replace with blank
}
}
echo "PROCESSED: $string\r\n\r\n";
Note: This is not a direct answer, but a suggestion to another approach.
I was told once; if you can do it in any other way, stay away from regex. I don't though, it's my white whale. Have you heard of phpQuery? It's jQuery implemented in PHP and very powerful. It would be able to do what you want in a very easy way. I know it's not regex, but perhaps it's of use to you.
If you really want to go ahead, I can recommend http://gskinner.com/RegExr/ . I think it's a great tool.