This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I have php regex to find tag and extract css address from html page
'/<link.*?href\s*=\s*["\']([^"\']+)[^>]*>.*?\/>/i'
but it doesn't work good.can you help me to modify this code?
Perhaps
'/<link .*?(href=[\'|"](.*)?[\'|"]|\/?\>)/i'
Then you can acces the link with $2
Not that this is better than the other answer, however just in case you want to see it, I've altered your regex such that it should work as intended:
'/<link.*?href\s*=\s*["\']([^"\']+?)[\'"]/i'
Regex to find hrefs of all stylesheets can be a tricky task. You should consider using some PHP HTML parser to get this information.
You can read this article to get more information and then try this code.
// Retrieve all links and print their HREFs
foreach($html->find('link') as $e)
echo $e->href . '<br>';
// Retrieve all script tags and print their SRCs
foreach($html->find('script') as $e)
echo $e->src . '<br>';
PS: Remember, your script tag may not contain a src then it will print empty string.
Related
This question already has answers here:
how to use dom php parser
(4 answers)
Closed 9 years ago.
<?php
$html = file_get_contents('http://xpool.xram.co/index.cgi');
echo $html;
?>
I want to get information in a tag on a remote web site using php. and only the tags.
I found this small string that is great for retrieving the entire site source. However, i want to get a small section only. How can I filter out all the other tags and get only the one tag I need?
I'd suggest using a PHP DOM parser. (http://simplehtmldom.sourceforge.net/manual.htm)
require_once ('simple_html_dom.php');
$html = file_get_contents('http://xpool.xram.co/index.cgi');
$p = $html->find('p'); // Find all p tags.
$specific_class = $html->find('.classname'); // Find elements with classname as class.
$element_id = $html->find('#element'); // Find element with the id element
Read the docs, there are tons of other options available.
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Getting title and meta tags from external website
(21 answers)
Closed 9 years ago.
hi i am working on a project . in this project i need another websites meta content.
here is an example
this is remote websites meta content
<meta content="Kültür Sanat Edebiyat Portalı. Geniş Türkçe şiir ve şair arşivi. Yarışmalar, şiir etkinlikleri, sanat haberleri. Kitaplar ile ilgili geniş ve detaylı tanıtımlar. Resim tiyatro sergi." name="description">
i tried php file_get_contents
<?php
$homepage = file_get_contents('http://antoloji.com/');
echo $homepage;
?>
but coulnot find a way how to take only meta content (description part)
thank you for your advice
PHP has a really useful function for this, get_meta_tags which allows you to parse the meta tags of a websites source.
<?php
// Assuming the above tags are at www.example.com
$tags = get_meta_tags('http://www.example.com/');
// Notice how the keys are all lowercase now, and
// how . was replaced by _ in the key.
echo $tags['author']; // name
echo $tags['keywords']; // php documentation
echo $tags['description']; // a php manual
echo $tags['geo_position']; // 49.33;-86.59
?>
This question already has answers here:
How do you parse and process HTML/XML in PHP?
(31 answers)
Closed 9 years ago.
I wanna extract some html from page Y
for example site is
<head>xxxx</head>
<body>....<div id="ineedthis_code"> </div> ...</body>
it is possible to do this file_get_contents ?!
i need only that div nothing else
Without using a special library (which is the best way in most cases), you can use the explode-function:
$content = file_get_contents($url);
$first_step = explode( '<div id="YOUR ID HERE">' , $content ); // So you will get two array elements
$second_step = explode("</div>" , $first_step[1] ); // "1" depends, if you have more elements with this id (theoretical)
echo $second_step[0]; // You will get the first element with the content within the DIV :)
Please note, it's only an example without error handling. It also works onlny on a special case; not if the html structure ist changing. Even simple spaces can break this code. So you should better use a parsing library ;-)
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to parse and process HTML with PHP?
I'm trying to scrape a page with PHP using file_get_contents().
This page has some JSON wrapped in a bit of HTML. I'd like to strip out this HTML to be able to use json_decode() on the scraped string so I can deal with the JSON separately.
Is there any clean way to do that? A quick search didn't really lead to anything.
Thanks
parsing/stripping HTML content is always a tricky one because (common?) solutions via regex might crash if the HTML markup is malformed and are painful slow btw. I would suggest using this little HTML DOM parser class:
http://simplehtmldom.sourceforge.net/
edited & added from subcomment:
Okay this is a bad one because the inline javascript is not properly wrapped with CDATA-Tags. Otherwise something like this might work:
$html = new simple_html_dom();
$html->load_file('your-external-file');
foreach($html->find("script") as $obj) {
if(isset($obj->innertext) && strpos($obj->innertext, 'window._jscalls'))
echo $obj->innertext;
}
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Regex to change format of all img src attributes
Hi,
I want to replace the image path in my content db field.
I have the following
preg_replace("/src='(?:[^'\/]*\/)*([^']+)'/g","src='newPath/$2'",$content);
which is working fine for
src="/path/path/image.jpg"
BUT fails ON
src="http://www.mydomain.com/path/path/image.jpg"
Any help to bypass this problem?
Don't use regular expressions for this. Use a HTML parser like Simple HTML DOM.
$html = file_get_html('http://www.example.com/sourcepage.html');
foreach($html->find('img') as $element)
{
$new_src = "Do stuff with new src here";
$element->src = $new_src;
}
echo $html; // Output new code