PHP File get contents - php

I'm wondering if I can use file get contents to get only specific elements within the desired page. For instance: only store the content within the <title></title> tags or within the <meta></meta> tags and so forth. If it's not possible with file_get_contents what can I use to achieve that?
Thanks!

You can use explode() function;
$content = file_get_contents("http://www.demo.com");
$explodedContent = explode("<title>", $content);
$explodedExplodedContent = explode("</title>", $explodedContent[1]);
echo $explodedExplodedContent[0]; // title of that page.
You can make it into a function;
function contentBetween($beginStr, $endStr, $contentURL){
$content = file_get_contents($contentURL);
$explodedContent = explode($beginStr, $content);
$explodedExplodedContent = explode($endStr, $explodedContent[1]);
return $explodedExplodedContent[0];
}
echo contentBetween("<title>", "</title>", "http://www.demo.com"); // usage.
See more about explode() f: http://php.net/manual/tr/function.explode.php

Related

Trying to grab value from html page but getting template back not the value - php

I am making a price crawler for a project but am running into a bit of an issue. I am using the below code to extract values from an html page:
$content = file_get_contents($_POST['url']);
$resultsArray = array();
$sqlresult = array();
$priceElement = explode( '<div>value I want to extract</div>' , $content );
Now when I use this to get certain elements I only get back
Finance: {{value * value2}}
I want to get the actual value that would be displayed on the screen e.g
Finance: 7.96
The other php methods I have tried are:
curl
file_get_html(using simple_html_dom library)
None of these work either :( Any ideas what I can do?
You just set the <div>value I want to extract</div> as a delimiter, which means PHP looks for it to separate your string to array whenever this occurs.
In the following code we use , character as a delimiter:
<?php
$string = "apple,banana,lemon";
$array = explode(',', $string);
echo $array[1];
?>
The output should be this:
banana
In your example you set the value you want to extract as a delimiter. That's why this happens to you. You'll need to set a delimiter between your string you want to obtain and other string you won't need at the moment.
For example:
<?php
$string = "iDontNeedThis-dontExtractNow-value I want to extract-dontNeedEither";
$priceElement = explode('-', $string);
echo "<div>".$priceElement[2]."</div>";
?>
The code should output this to your HTML page:
<div>value I want to extract</div>
And it will appear on your page like this:
value I want to extract
If you don't need to save the whole array in a variable, you can save the one index of it to variable instead:
$priceElement = explode('-', $string)[2];
echo $priceElement;
This will save only value I want to extract so you won't have to deal with arrays later on.

PHP replace {replace_me} with <?php include ?> in output buffer

I have a file like this
**buffer.php**
ob_start();
<h1>Welcome</h1>
{replace_me_with_working_php_include}
<h2>I got a problem..</h2>
ob_end_flush();
Everything inside the buffer is dynamically made with data from the database.
And inserting php into the database is not an option.
The issue is, I got my output buffer and i want to replace '{replace}' with a working php include, which includes a file that also has some html/php.
So my actual question is: How do i replace a string with working php-code in a output-buffer?
I hope you can help, have used way to much time on this.
Best regards - user2453885
EDIT - 25/11/14
I know wordpress or joomla is using some similar functions, you can write {rate} in your post, and it replaces it with a rating system(some rate-plugin). This is the secret knowledge I desire.
You can use preg_replace_callback and let the callback include the file you want to include and return the output. Or you could replace the placeholders with textual includes, save that as a file and include that file (sort of compile the thing)
For simple text you could do explode (though it's probably not the most efficient for large blocks of text):
function StringSwap($text ="", $rootdir ="", $begin = "{", $end = "}") {
// Explode beginning
$go = explode($begin,$text);
// Loop through the array
if(is_array($go)) {
foreach($go as $value) {
// Split ends if available
$value = explode($end,$value);
// If there is an end, key 0 should be the replacement
if(count($value) > 1) {
// Check if the file exists based on your root
if(is_file($rootdir . $value[0])) {
// If it is a real file, mark it and remove it
$new[]['file'] = $rootdir . $value[0];
unset($value[0]);
}
// All others set as text
$new[]['txt'] = implode($value);
}
else
// If not an array, not a file, just assign as text
$new[]['txt'] = $value;
}
}
// Loop through new array and handle each block as text or include
foreach($new as $block) {
if(isset($block['txt'])) {
echo (is_array($block['txt']))? implode(" ",$block['txt']): $block['txt']." ";
}
elseif(isset($block['file'])) {
include_once($block['file']);
}
}
}
// To use, drop your text in here as a string
// You need to set a root directory so it can map properly
StringSwap($text);
I might be misunderstanding something here, but something simple like this might work?
<?php
# Main page (retrieved from the database or wherever into a variable - output buffer example shown)
ob_start();
<h1>Welcome</h1>
{replace_me_with_working_php_include}
<h2>I got a problem..</h2>
$main = ob_get_clean();
# Replacement
ob_start();
include 'whatever.php';
$replacement = ob_get_clean();
echo str_replace('{replace_me_with_working_php_include}', $replacement, $main);
You can also use a return statement from within an include file if you wish to remove the output buffer from that task too.
Good luck!
Ty all for some lovely input.
I will try and anwser my own question as clear as I can.
problem: I first thought that I wanted to implement a php-function or include inside a buffer. This however is not what I wanted, and is not intended.
Solution: Callback function with my desired content. By using the function preg_replace_callback(), I could find the text I wanted to replace in my buffer and then replace it with whatever the callback(function) would return.
The callback then included the necessary files/.classes and used the functions with written content in it.
Tell me if you did not understand, or want to elaborate/tell more about my solution.

Get second item with preg_match in php

I am getting text between two tags with PHP (from a HTML).
a sample code i use is this :
function GDes($url) {
$fp = file_get_contents($url);
if (!$fp) return false;
$res = preg_match("/<description>(.*)<\/description>/siU", $fp, $title_matches);
if (!$res) return false;
$description = preg_replace('/\s+/', ' ', $title_matches[1]);
$description = trim($description);
return $description;
}
It gives between the description tags, But my problem is that if the page have to description tags, it will give the first one that i don't need it.
I need to get the second one.
For example, If my HTML is this :
<description>No need to this</description>
<description>I NEED THIS ONE</description>
I need to give the second description tag with that function above.
What changes the function needed ?
Use preg_match_all instead. It will create an array with all matches.
You can keep your code as is, just replace preg_match with preg_match_all.
Then you have to use $title_matches[1][1] instead of $title_matches[1] in your preg_replace call, since the $title_matches is now a multidimensional array.

Strip directory structure in HTML

I have a PHP application that reads in a bit of HTML. In this HTML there may be an img tag. What I want to do is strip the directory structure from the src of the image tag e.g.
<img src="dir1/dir2/dir3/image1.jpg>
to
<img src="image1.jpg">
Anyone have any pointers?
Thanks,
Mark
As a suggestion, rather than using regex, you may be better off using something like the SimpleXML class to traverse the HTML, that way you'd be able to find the img tags and their src attribute then change it easily. Rather than having to try and parse a whole document with regex. After you've done that you'd be able to just explode the string using the "/" delimiter and use the last value of the exploded array as the src attribute.
PHP.net's SimpleXML Manual: http://php.net/manual/en/book.simplexml.php
This is a tutorial how to change all links in a HTMl document: Scraping Links From HTML.
With a slight modification of the example, this could do it:
<?php
require('FluentDOM/FluentDOM.php');
$html = '<img src="dir1/dir2/dir3/image1.jpg">';
$fd = FluentDOM($html, 'html')->find('//img[#src]')->each(
function ($node) use ($url) {
$item = FluentDOM($node);
$item->attr('href', basename($item->attr('src')));
}
);
$fd->contentType = 'xml';
header('Content-type: text/xml');
echo $fd;
?>
If you want to try this with regexp this could work:
$subject = "dir1/dir2/dir3/image1.jpg";
$pattern = '/^.*\//';
$result = preg_replace($pattern, '', $subject);

Modification to a code to merge two parts of it with similar characteristics

Below is a link crawler that gets the urls of a page in a given depth. At the end of it I added a regular expression to match all the emails of the url that is just crawled. As you can see in the second part, it file_get_content the same page it just downloaded, meaning twice the execution time, bandwidth etc.
The question is how can I merge those two parts to use the first downloaded page, to avoid getting it again? Thank you.
function crawler($url, $depth = 2) {
$dom = new DOMDocument('1.0');
if (!$parts || !#$dom->loadHTMLFile($url)) {
return;
}
.
.
.
//this is where the second part starts
$text = file_get_contents($url);
$res = preg_match_all("/[a-z0-9]+([_\\.-][a-z0-9]+)*#([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i", $text, $matches);
}
Replace:
$text = file_get_contents($url);
with:
$text = $dom->saveHTML();
http://www.php.net/manual/en/domdocument.savehtml.php
Alternatively, in the first part of your function, you could save the HTML into a variable using file_get_contents, then pass it to $dom->loadHTML. That way you can then reuse the variable with your regex.
http://www.php.net/manual/en/domdocument.loadhtml.php

Categories