regular expression for href from html page in php ? - php

i want get all/farsi_persian/*/subtitle-*.aspx from one html page
i try some regular expression on PHP but not find
can help me ?>
<span class="r0" title="Rating 0 out of 10">Farsi/Persian</span> Arrow - Third Season subtitle<br><small> Arrow.S03E03.HDTV.480p.x264-LOL </small>

Try
/farsi_persian/[^/]+/subtitle-[^.]+.aspx

try using preg_match_all(), this might work::
preg_match_all('/\/farsi_persian\/[\w-]+\/subtitle-[\d]+.aspx/', $str, $matches);
assuming there's always numbers after subtitle-

$myPattern = "/farsi_persian/[^/]+/subtitle-[^.]+.aspx";
preg_match_all($myPattern,$myText,$match);
var_dump($match);
it show null;
its worked
$myPattern = "/farsi_persian\/(.*.)\/subtitle-\d*.aspx/";

Related

Regex to select url except when = is directly infront of it

I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example

php regex to get middle of string

I parse an html page into a plain text in order to find and get a numeric value.
In the whole html mess, I need to find a string like this one:
C) Debiti33.197.431,90I - Di finanziamento
I need the number 33.197.431,90 (where this number is going to change on every html parsing request.
Is there any regex to achieve this? For example:
STARTS WITH 'C) Debiti' ENDS WITH 'I - Di finanziamento' GETS the middle string that can be whatever.
Whenever I try, I get empty results...don't know that much about regex.
Can you please help me?
Thank you very much.
You could try the below regex,
^C\) Debiti\K.*?(?=I - Di finanziamento$)
DEMO
PHP code would be,
<?php
$mystring = "C) Debiti33.197.431,90I - Di finanziamento";
$regex = '~^C\) Debiti\K.*?(?=I - Di finanziamento$)~';
if (preg_match($regex, $mystring, $m)) {
$yourmatch = $m[0];
echo $yourmatch;
}
?> //=> 33.197.431,90
This should work. Read section Want to Be Lazy? Think Twice.
(?<=\bC\) Debiti)[\d.,]+(?=I - Di finanziamento\b)
Here is demo
sample code:
$re = "/(?<=\\bC\\) Debiti)[\\d.,]+(?=I - Di finanziamento\\b)/i";
$str = "C) Debiti33.197.431,90I - Di finanziamento";
preg_match($re, $str, $matches);

PHP regular expression help?

I have problem writing a regular express which match with only div class name "classBig1" and has one anchor link as its child.
Here is my code but it doesn't work:
preg_match_all ("/<div class=\"headline9\"><a[\s]+[^>]*?href[\s]?=[\s\"\']+".
"(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a></div>/",
$var, &$matches);
//example HTML: <div class="classBig1">Go Index99</div>
If the HTML is as well formed as your example then the following regex is enough to solve your problem:
<div class="classBig1"><a .*?</div>
The full PHP code would be:
preg_match_all('%<div class="classBig1"><a .*?</div>%', $html,
$result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
$match = $result[0][$i];
}
I guess you had mentioned a wrong class-name in the code, but I consider it is "classBig1" - please take a look at the pattern that I have given.
I believe:
You just wanted to get those "DIV" which has a class of "classBig1"
These "DIVs" should have only one "A" tag.
If yes, then don't hesitate to grab this piece of code :-).
It seems to be working for me when I tried with a sample HTML code.
Pattern:
"/<div class=\"classBig1\"><a (.*)<\/a><\/div>/"
Hope it helps.

How can I find the rest of a word from a string within it in PHP?

Let's say I have a page I want to scrape for words with "ice" in them, how can I do this easily? I see a lot of scrapers breaking things down into source code, but I don't need this. I just need something that searches through the plain text on the webpage.
Edit: I basically need something to search for .jpeg and find the entire file name. (it is in plain text on the website, not hidden in a tag)
Anything that matches the following is a word with ice in it:
/(\w*)ice(\w*)/i
(Do note that \w matches 0-9 and _ too. The following might give better results: /\b.*?ice\b.*?/i)
UPDATE
To match file names (must not contain whitespace):
/\S+\.jpeg/i
Example:
<?php
$str = 'Picture of me: 238484534.jpeg and someone else img-of-someone.jpeg here';
$cnt = preg_match_all('/\S+\.jpeg/i', $str, $matches);
print_r($matches);
1.do u want to read the word inside the HTML tags too like attribute,textname ?
2.Or only the visible part of the webpage ?
for#1 : solutions are simple and already there as mentioned in other answers.
for#2:
Use PHP DOMDOCUMENT class, and extract and search in innerHTML only.
documentation here :
http://php.net/manual/en/class.domdocument.php
see this for example:
PHP DOMDocument stripping HTML tags
Some regex use will be needed for this. Below I use PCRE http://www.php.net/manual/en/ref.pcre.php and the function preg_match http://www.php.net/manual/en/function.preg-match-all.php
<?php
$html = <<<EOF
<html>
<head>
<title>Test</title>
</head>
<body>List of files:
<ul>
<li>test1.jpeg</li>
<li>test2.jpeg</li>
</ul>
</body>
</html>
EOF;
$matches = array();
$count = preg_match_all("([0-9a-zA-Z_-]+\.jpeg)", $html, $matches);
if (count($matches) > 1) {
for ($i = 1; $i < count($matches); $i++) {
print "Filename: {$matches[$i]}\n";
}
}
?>
try this:
preg_match_all('/\w*ice\w*/', 'abc icecream lice', $matches);
print_r($matches);

How to remove a text from a variable? (php)

I have a variable $link_item, it's used with echo and gives the strings like
<span class="name">Google</span>http://google.com
How to remove "<span class="name">Google</span>" from string?
It should give just "http://google.com".
Heard it can be done with regex(), please help.
Without regex:
echo substr($link_item, stripos($link_item, 'http:'))
But this only works if the first part (i.e. <span class="name">Google</span>) never contains http:. If you can assure this: here you go :)
Reference: substr, stripos
Update:
As #Gordon points out in his comment, my code is doing the same as strstr() already does. I just put it here in case one does not read the comments:
echo strstr($link_item, 'http://');
$string = '<span class="name">Google</span>http://google.com';
$pieces = explode("</span>",$string);
//In case there is more than one span before the URL
echo $pieces[count($pieces) -1];
Solved:
$contents = '<span class="name">Google</span>http://google.com';
$new_text = preg_replace('/<span[^>]*>([\s\S]*?)<\/span[^>]*>/', '', $contents);
echo $new_text;
// outputs -> http://google.com
Don't use a regex. Use a HTML parser to extract only the text you want from it.
Made myself
$link_item_url = preg_replace('#<span[^>]*?>.*?</span>#si', '', $link_item);
This will remove any <span + something + </span> from variable $link_item.
Thanks for all.

Categories