This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Grabbing the href attribute of an A element
hi, i have this string in PHP
<iframe frameborder="0" width="320" height="179" src="http://www.dailymotion.com/embed/video/xinpy5?width=320&wmode=transparent"></iframe><br />Le buzz Pippa Middleton agace la Reine ! <i>par direct8</i>
i would like to extract the url from the anchor href attribute using preg_match or other php functins
Don't use regexes to parse HTML. Use the PHP DOM:
$DOM = new DOMDocument;
$DOM->loadHTML($str); // Your string
//get all anchors
$anchors = $DOM->getElementsByTagName('a');
//display all hrefs
for ($i = 0; $i < $anchors->length; $i++)
echo $anchors->item($i)->getAttribute('href') . "<br />";
You can check if the node has a href using hasAttribute() first if necessary.
You can use
if (preg_match('#<a\s*[^>]*href="([^"]+)"#i', $string, $matches))
echo $matches[0];
try this regex
(?<=href=\")[\w://\.\-]+
Related
This question already has answers here:
Fastest way to retrieve a <title> in PHP
(7 answers)
Closed 3 years ago.
<?php
$content=file_get_contents('example.com');
// it would return html <head>.....
<title>Example.com</title>
I want to extract example.com from title
$title=pick('<title>','</title>',$content);
Echo $title;
And it would show Example.com
You can use substr to substring the HTML content and stripos to find the title tags.
I add 7 to the position to remove the tag.
$html = file_get_contents('example.com');
$pos = stripos($html, "<title>")+7;
echo substr($html, $pos, stripos($html, "</title>")-$pos);
Example:
https://3v4l.org/qvC40
This assumes there is only one title tag on the page, if there is more then it will get the first title tag.
You can use file_get_content() instead of $string.
$string = "<title>MY TITLE</title>";
$pattern = "/<title>(.*?)<\/title>/";
preg_match($pattern, $string, $matches);
echo "RESULT : ".$matches[1];
Try using PHP's simple xml parser to read the title node.
$xml = simplexml_load_string(file_get_contents('example.com'));
echo $xml->head->title;
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 7 years ago.
i need to get href value from class name like that.
some one can give me regex to find this class name and get from there the href ?
thanks alot
You can do this using DomDocument and XPath, see here
$str = '';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($d);
$links = $xpath->query('//a[#class="boldOrange"]');
foreach ($links as $link) {
$url = $link->getAttribute('href');
print $url . PHP_EOL;
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
How to parse and process HTML with PHP?
How can I convert ereg expressions to preg in PHP?
here is an example
echo "<div id='spaced' class='romaji'><span class='spaced orig word'>neko</span><span class='space'>";
please ignore the "echos" its the only way i could get the html to show
i need a reg express that can select whatever is between the
echo "<span class='spaced orig word'>";
tag and its ending tag
echo "</span>";
i tried
$pattern = "span class='spaced orig word'>(.+?)</s";
preg_match_all ($pattern, $jp_page, $result_ro);
if ($result_ro[1])
$results[] = implode(' ', $result_ro[1]);
else
return null; // Failed to retrieve Hiragana, so abort
and some other things, but i cant get it right, i get nothing most of the time because i dont really know what im doing with reg expressions
currently getting a warning with this code
Warning: preg_match_all(): Delimiter must not be alphanumeric or backslash
THE PONY HE COMES!
Instead, try using a DOM parser:
$dom = new DOMDocument();
$dom->loadHTML($jp_page);
$xpath = new DOMXPath($dom);
$spans = $xpath->query("//span[#class='spaced orig word']");
$results = "";
foreach($spans as $span) {
$results = " ".$span->textContent;
}
$results = trim($results);
return $results;
No delimiters
try this reg
<?php
$pattern = '#<span.*?>(.*?)</span>#';
This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
PHP : Parser asp page
I have this tag into asp page
<a class='Lp' href="javascript:prodotto('Prodotto.asp?C=3')">AMARETTI VICENZI GR. 200</a>
how can i parser this asp page for to have the text AMARETTI VICENZI GR. 200 ?
This is the code that I use but don't work :
<?php
$page = file_get_contents('http://www.prontospesa.it/Home/prodotti.asp?c=12');
preg_match_all('#(.*?)#is', $page, $matches);
$count = count($matches[1]);
for($i = 0; $i < $count; $i++){
echo $matches[2][$i];
}
?>
You're regular expression (in preg_match_all) is wrong. It should be #<a class='Lp' href="(.*?)">(.*?)</a>#is since the class attribute comes first, not last and is wrapped in single quotes, not double quotes.
You should highly consider using DOMDocument and DOMXPath to parse your document instead of regular expressions.
DOMDocument/DOMXPath Example:
<?php
// ...
$doc = new DOMDocument;
$doc->loadHTML($html); // $html is the content of the website you're trying to parse.
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//a[#class="Lp"]');
foreach ( $nodes as $node )
echo $node->textContent . PHP_EOL;
You have to modify the regular expression a little based on the HTML code of the page you are getting the content from:
'#<a class=\'Lp\' href="(.*?)">(.*?)</a>#is'
Note that the class is first and it is surrounded by single quotes not double. I tested and it works for me.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Best methods to parse HTML with PHP
I understand I should be using a html parser like php domdocument (http://docs.php.net/manual/en/domdocument.loadhtml.php) or tagsoup.
How would I use php domdocument to extract text between specific tags, for example get text between h1,h2,h3,p,table? It seems I can only do this for one tag only with getelementbytagname.
Is there a better html parser for such task? Or how would I loop through the php domdocument?
You are correct, use DomDocument (since regex is NOT a good idea for parsing HTML. Why? See here and here for reasons why).
getElementsByTagName gives you a DOMNodeList that you can iterate over to get the text of all the found elements. So, your code could look something like:
$document = new \DOMDocument();
$document->loadHTML($html);
$tags = array ('h1', 'h2', 'h3', 'h4', 'p');
$texts = array ();
foreach($tags as $tag)
{
$elementList = $document->getElementsByTagName($tag);
foreach($elementList as $element)
{
$texts[$element->tagName][] = $element->textContent;
}
}
return $texts;
Note that you should probably have some error handling in there, and you will also lose the context of the texts, but you can probably edit this code as you see fit.
You can doing so with a regex.
preg_match_all('#<h1>([^<]*)</h1>#Usi', $html_string, $matches);
foreach ($matches as $match)
{
// do something with $match
}
I am not sure what is your source so I added a function to get the content via the URL.
$file = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($file);
$body = $doc->getElementsByTagName('body');
$h1 = $body->getElementsByTagName('h1');
I am not sure of this part:
for ($i = 0; $i < $items->length; $i++) {
echo $items->item($i)->nodeValue . "\n";
}
Or:
foreach ($items as $item) {
echo $item->nodeValue . "\n";
}
Here is more info on nodeValue: http://docs.php.net/manual/en/function.domnode-node-value.php
Hope it helps!