php preg_replace relative to absolute in url css - php

I am trying to find a regex to convert relative url in css code 'url()'. So far this is what I have:
$domain = "http://example.com/";
$html = "url(1.css), url(' 1.css'), url( \"1.css\")";
$rep['/url(\s*)\((\s*)"(\s*)(?!https?:\/\/)(?!data:)(?!#)/i'] = 'url("'.$domain;
$rep["/url(\s*)\((\s*)'(\s*)(?!https?:\/\/)(?!data:)(?!#)/i"] = "url('".$domain;
$html = preg_replace(
array_keys($rep),
array_values($rep),
$html
);
echo $html;
Current Output:
url(1.css), url('http://example.com/1.css'), url("http://example.com/1.css")
Desired Output:
url(http://example.com/1.css), url('http://example.com/1.css'), url("http://example.com/1.css")

Are you expecting this? Hope it will work fine.
Try this code snippet here
Regex: /url[\s]*\([\s]*(?!'|\")(?!https?:\/\/)(?!data:)(?!#)
Here this regex will match url ( and after that there will be no " or '.
<?php
ini_set('display_errors', 1);
$domain = "http://example.com/";
$html = "url(1.css), url(' 1.css'), url( \"1.css\")";
$rep['/url[\s]*\([\s]*"[\s]*(?!https?:\/\/)(?!data:)(?!#)/i'] = 'url("'.$domain;
$rep["/url[\s]*\([\s]*'[\s]*(?!https?:\/\/)(?!data:)(?!#)/i"] = "url('".$domain;
$rep["/url[\s]*\([\s]*(?!'|\")(?!https?:\/\/)(?!data:)(?!#)/i"] = "url(".$domain;
$html = preg_replace(
array_keys($rep),
array_values($rep),
$html
);
echo $html;

Related

how to make url rewiting works on read more link

I have read more link that shows part of the text. When clicked, it shows the rest of the text on next page. Here is my code:
<?php
$content = "";
$link = "linkreadmore.php";
$limit = 400;
function readMore($content,$link,$var,$id, $limit) {
$content = substr($content,0,$limit);
$content = substr($content,0,strrpos($content,' '));
$content = $content." <a href='$link?$var=$id'>More...</a>";
return $content;
?>
Now i want rewrite the URL of linkreadmore.php which lives in folder1/folder2/linkreadmore.php. Here is my rewrite rule.
RewriteRule ^/?more/([0-9]+)$ /socialM/polip/linkreadmore.php?var=$1 [L]
Then, i rewrite the read more script like this.
<?php
$content = "";
$link = "more";
$limit = 400;
function readMore($content,$link,$var,$id, $limit) {
$content = substr($content,0,$limit);
$content = substr($content,0,strrpos($content,' '));
$content = $content." <a href='/$link/=$id'>More...</a>";
return $content;
?>
It shows file not found. What am I missing here?
You have an extra = in the URL. This line should work instead:
$content = $content." <a href='/$link/$id'>More...</a>"
You‘re also missing a closing } for your function, but if the code works, that’s probably just a typo in the question.

(PHP) file_get_contents - Contents url after questionmark?

First of all, i'm new to PHP. I've got the following code-snippet and it works pretty well.
<?php
$html = file_get_contents('http://www.example.com/');
preg_match_all( '|<img.*?src=[\'"](.*?)[\'"].*?>|i',$html, $matches );
foreach ($matches[1] as $match) {
echo '<img src="' . $match . '" />';
}
?>
Now i want to add the file_get_contents url after a questionmark in the filename like:
http://www.example.com/getfile.php?http://www.url.com/images/
How to do this? Is this cURL?
Solution:
Changing:
$html = file_get_contents('http://www.example.com/');
to
$html = file_get_contents($_GET["url"]);
works! :)

PHP Fatal error: Cannot use object of type simple_html_dom as array

I am working on web scraping application using simple_html_dom. I need to extract all the images in a web page. The following are the possibilities:
<img> tag images
if there is a css with the <style> tag in the same page.
if there is an image with the inline style with <div> or with some other tag.
I can scrape all the images by using the following code.
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path;
download_file($img_url, $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$html = str_replace($img_url , $url_to_be_change , $html );
}
return $html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Please note that, I am modifying the HTML too after image downloading.
downloading is working fine. but when i am trying to save the HTML, then its giving the following error:
PHP Fatal error: Cannot use object of type simple_html_dom as array
Important: its working perfectly fine, if I am not using str_replace and second loop.
Fatal error: Cannot use object of type simple_html_dom as array in /var/www/html/app/framework/cache/includes/simple_html_dom.php on line 1167
Guess №1
I see a possible mistake here:
$html = str_get_html($html);
Looks like you pass an object to function str_get_html(), while it accepts a string as an argument. Lets fix that this way:
$html = str_get_html($html->plaintext);
We can only guess what is the content of the $html variable, that comes to this piece of code.
Guess №2
Or maybe we just need to use another variable in function download_images to make your code correct in both cases:
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$result_html = "";
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$result_html = str_replace($img_url , $url_to_be_change , $html );
}
return $result_html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Explanation: if there was no matches (array $matches is empty) we never go in the second cycle, thats why variable $html still has the same value as at beginning of the function. This is common mistake when you're trying to use same variable in the place of code where you need two different variables.
As the error message states, you are dealing with an Object where you should have an array.
You could try tpyecasting your object:
$array = (array) $yourObject;
That should solve it.
I had this error, I solved it by using (in my case) return $html->save(); in end of function.
I can't explain why two instances with different variable names, and scoped in different functions made this error. I guess this is how the "simple html dom" class works.
So just to be clear, try: $html->save(), before you do anything else after
I hope this information helps somebody :)

Can we Use Replace By str_replace in a code fetched from remote url

i got Source Code From Remote Url Like This
$f = file_get_contents("http://www.example.com/abc/");
$str=htmlspecialchars( $f );
echo $str;
in that code i want to replace/extract any url which is like
href="/m/offers/"
i want to replace that code/link as
href="www.example.com/m/offers/"
for that i used
$newstr=str_replace('href="/m/offers/"','href="www/exmple.com/m/offers/',$str);
echo $newstr;
but this is not replacing anything now i want to know 1st ) can i replace by str_replace ,in the code which is fetched from remote url and if 'yes' how ...? if 'no' any other solution ?
There will not be any " in your $str because htmlspecialchars() would have converted them all to be " before it got to your str_replace.
I start assuming all href attributes belong to tags.
Since we know if all tags are written in the same way. instead of opting for regular expressions, I will use an interpreter to facilitate the extraction process
<?php
use Symfony\Component\DomCrawler\Crawler;
$base = "http://www.example.com"
$url = $base . "/abc/";
$html = file_get_contents($url);
$crawler = new Crawler($html);
$links = array();
$raw_links = array();
$offers = array();
foreach($crawler->filter('a') as $atag) {
$raw_links[] = $raw_link = $atag->attr('href');
$links[] = $link = str_replce($base, '', $raw_link);
if (strpos($link, 'm/offers') !== false) {
$offers[] = $link;
}
}
now you have all the raw links, relative links and offerslinks
I use the DomCrawler component

Parsing image url from source code of the page

Here is my regex to get the image url on the page.
<?php
$url = $_POST['url'];
$data = file_get_contents($url);
$logo = get_logo($data);
function get_logo($html)
{
preg_match_all('/\bhttps?:\/\/\S+(?:png|jpg)\b/', $html, $matches);
//echo "mactch : $matches[0][0]";
return $matches[0][0];
}
?>
Is there any thing missing in regex? for some of the url it does not give image url though they have image in it.
for example: http://www.milanart.in/
it does not give image on that page.
Please No dome. I could not use it.
<?php
$url = "http://www.milanart.in";
$data = file_get_contents($url);
$logo = get_logo($data);
function get_logo($html)
{
preg_match_all("/<img src=\"(.*?)\"/", $html, $matches);
return $matches[1][0];
}
echo 'logo path : '.$logo;
echo '<img src="'.$url.'/'.$logo.'" />';
?>
Use DOM Class of PHP to get all images:
Search for image files in CSS.....url(imagefilename.extension)
Search for image file in HTML ......

Categories