Regular Expression not match href strings [duplicate]

Regular Expression not match href strings [duplicate] - php

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
I have the following regex:
$regex = '<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>';
how can I improve this to NOT match the word "files" or "resize" in the href tag:
link or

yes parsing is the mutch better way to do this - maybe someone find this helpful:
$inhalt = new DOMDocument;
$inhalt->loadHTML($content->draw()[0][0]);
foreach ($inhalt->getElementsByTagName('a') as $node) {
if ($node->hasAttribute('href')) {
if (preg_match("/(files|resize)/", $node->getAttribute('href')) == 0) {
$node->setAttribute('href', 'mobile.php?uri=http://www.example.com' . str_replace("..", "", $node->getAttribute('href')));
$inhalt->saveHtml($node);
}
}
}
echo $inhalt->saveHtml();

You can use this regex to get all href string:
<a[^>]*href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>

Related

How to extract title from php file get contents? [duplicate]

This question already has answers here:
Fastest way to retrieve a <title> in PHP
(7 answers)
Closed 3 years ago.
<?php
$content=file_get_contents('example.com');
// it would return html <head>.....
<title>Example.com</title>
I want to extract example.com from title
$title=pick('<title>','</title>',$content);
Echo $title;
And it would show Example.com

You can use substr to substring the HTML content and stripos to find the title tags.
I add 7 to the position to remove the tag.
$html = file_get_contents('example.com');
$pos = stripos($html, "<title>")+7;
echo substr($html, $pos, stripos($html, "</title>")-$pos);
Example:
https://3v4l.org/qvC40
This assumes there is only one title tag on the page, if there is more then it will get the first title tag.

You can use file_get_content() instead of $string.
$string = "<title>MY TITLE</title>";
$pattern = "/<title>(.*?)<\/title>/";
preg_match($pattern, $string, $matches);
echo "RESULT : ".$matches[1];

Try using PHP's simple xml parser to read the title node.
$xml = simplexml_load_string(file_get_contents('example.com'));
echo $xml->head->title;

php preg_match add href .html [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 5 years ago.
I would like to change all href's in a website.
xyz …
to:
<a href="sitename.html>xyz …</a>
Thanks for help!

You could do it using the following way ...
$hrefs = 'xyz …
xyz …';
$r = '/(?<=href=").*?(?=">)/';
$sitename = 'sitename.html';
$result = preg_replace($r, $sitename, $hrefs);
echo $result;
DEMO

$content = file_get_contents('page.html');
$new_content = str_replace('<a href="', '<a href="sitename.html', $content);
echo $new_content;

get href value by class name regex and php [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
i need to get href value from class name like that.
some one can give me regex to find this class name and get from there the href ?
thanks alot

You can do this using DomDocument and XPath, see here
$str = '';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($d);
$links = $xpath->query('//a[#class="boldOrange"]');
foreach ($links as $link) {
$url = $link->getAttribute('href');
print $url . PHP_EOL;
}

php regex for parsing stock symbols in html code [duplicate]

This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 3 years ago.
i have following php regex code.. i want to extract the stock symbol in some html output.
The stock symbol i want to extract is /q?s=XXXX -- XXXX (the stock symbol) could be 1 to 5 characters long.
if(preg_match_all('~(?<=q\?s=)[-A-Z.]{1,5}~', $html, $out))
{
$out[0] = array_unique($out[0]);
} else {
echo "FAIL";
}
HTML code below (case 1 and case that i applied this to)
case #1 (does *not* work)
Bellicum Pharmaceuticals, Inc.
case #2 (does work correctly)
NYLD
Looking for suggestions on how i can update my php regex code to make it work for both case 1 and case 2. Thanks.

Instead of using regex, make effective use of DOM and XPath to do this for you.
$doc = new DOMDocument;
#$doc->loadHTML($html); // load the HTML data
$xpath = new DOMXPath($doc);
$links = $xpath->query('//a[substring(#href, 1, 5) = "/q?s="]');
foreach ($links as $link) {
$results[] = str_replace('/q?s=', '', $link->getAttribute('href'));
}
print_r($results);
eval.in

The answer seems nice, but it seems like a lot of work and code to maintain, no?
if (preg_match_all('/q\?s=(\S{1,5})\"/', $html, $match)) {
$symbols = array_unique($match[1]);
}
or even shorter... '/q\?s=(\S+)\"/'

Regular expressions, how to get all that is between tags? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML/XML with PHP?
Grabbing the href attribute of an A element
how can you get all that is between the tags: <td class="detail"></td> ? I use PHP.

Why don't you use DOMXpath?
$xpath = new DOMXPath(yourHtml);
$nodeList = $xpath->query( "//td[#class='detail']" );
foreach($nodeList as $node)
{
echo $node->nodeValue;
}
Parsing HTML with regex isn't a good choice

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression not match href strings [duplicate] - php

This question already has answers here: How do you parse and process HTML/XML in PHP? (31 answers) Closed 7 years ago. I have the following regex: $regex = '<a\s[^>]href=(\"??)([^\" >]?)\\1[^>]>(.)<\/a>'; how can I improve this to NOT match the word "files" or "resize" in the href tag: link or

You can use this regex to get all href string: <a[^>]href=[\"\'](.?)[\"\'][^>]>(.?)</a>

Related

How to extract title from php file get contents? [duplicate]

php preg_match add href .html [duplicate]

get href value by class name regex and php [duplicate]

php regex for parsing stock symbols in html code [duplicate]

Regular expressions, how to get all that is between tags? [duplicate]

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression not match href strings [duplicate] - php

This question already has answers here: How do you parse and process HTML/XML in PHP? (31 answers) Closed 7 years ago. I have the following regex: $regex = '<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>'; how can I improve this to NOT match the word "files" or "resize" in the href tag: link or

You can use this regex to get all href string: <a[^>]*href=[\"\'](.*?)[\"\'][^>]*>(.*?)</a>

Related

How to extract title from php file get contents? [duplicate]

php preg_match add href .html [duplicate]

get href value by class name regex and php [duplicate]

php regex for parsing stock symbols in html code [duplicate]

Regular expressions, how to get all that is between tags? [duplicate]

Categories

Resources

This question already has answers here: How do you parse and process HTML/XML in PHP? (31 answers) Closed 7 years ago. I have the following regex: $regex = '<a\s[^>]href=(\"??)([^\" >]?)\\1[^>]>(.)<\/a>'; how can I improve this to NOT match the word "files" or "resize" in the href tag: link or

You can use this regex to get all href string: <a[^>]href=[\"\'](.?)[\"\'][^>]>(.?)</a>