In PHP
I'm converting a large number of blog posts into something more mobile friendly. As these blogs posts seem to have alot of large images in them, I would like to use some regex or something to go thorugh the article's HTML and replace any images that aren't currently linked with a link to that image. This will allow the mobile browser to display the article without any images, but a link to the image inplace of where the image would be thus downsizing the page download size.
Alternatively, if anyone knows any php classes/functions that can make the job of formatting these posts easier, please suggest.
Any help would be brilliant!
To parse HTML, use an HTML parser. Regex is not the correct tool to parse HTML or XML documents. PHP's DOM offers the loadHTML() function. See http://in2.php.net/manual/en/domdocument.loadhtml.php. Use the DOM's functions to access and modify img elements.
How about doing it in JQuery instead of PHP. That way, it will work across different blogging software.
You can do something like...
$(document).ready( function () {
$('#content img').each(function () {
var imageUrl = $(this).attr('src');
//* Now write codes to delete off the image and put in a link instead. :)
});
});
With Regexp something like this:
$c = 0;
while(preg_match('/<img src="(.*?)">/', $html, $match) && $c++<100) {
$html = str_replace($match[0], 'Image '.$c.'');
}
You can also use preg_replace (saves the loop), but the loop allows for easy extensions, e.g. using the image functions to create thumbnails.
Related
I'm building a theme on wordpress, and I need a conditional if the post or page content has links to images (href = "....../fubar.jpg").
This would:
if (is_singular () && ** conditional true to content with links to pictures **) {
/ / This post/page has links to images
else {}
/ / No link to image found
}
Would have to detect jpg, png and gif.
Is it possible to do this in wordpress?
Yes, Wordpress is a php framework and in php you can detect that.
What you could do, is use an xml parser like DOMDocument or Simple HTML DOM Parser to see if there are any links and process them if you find them. You could do that by opening the link and checking the file type (or process it as an image with for example getimagesize() and check the result).
Apparently you can use get_the_content() to return the content, then a library like simple html dom to find out if it has images:
if(str_get_html(get_the_content())->find('img', 0)){
// there's an image
} else {
// there's not an image
}
So my school has this very annoying way to view my rooster.
you have to bypass 5 links to get to my rooster.
this is the link for my class (it updates weekly without changing the link)
https://webuntis.a12.nl/WebUntis/?school=roc%20a12#Timetable?type=1&departmentId=0&id=2147
i want to display the content from that page on my website but with my
own stylesheet.
i don't mean this:
<?php
$homepage = file_get_contents('http://www.example.com/');
echo $homepage;
?>
or an iframe....
I think this can be better done using jquery and ajax. You can get jquery to load the target page, use selectors to strip out what you need, then attach it to your document tree. You should then be able to style it anyway you like.
I would recommend you to use the cURL library: http://www.php.net/manual/en/curl.examples.php
But you have to extract part of the page you want to display, because you will get the whole HTML document.
You'd probably read the whole page into a string variable (using file_get_contents like you mentioned for example) and parse the content, here you have some possibilities:
Regular expressions
Walking the DOM tree (eg. using PHPs DOMDocument classes)
After that, you'd most likely replace all the style="..." or class="..." information with your own.
I'm trying to get all CSS files of an html file from URL.
I know that if I want to get the HTML code it is easy - just using PHP function - file_get_contents.
The question is - if I could search easily inside an a URL of HTML and get from there the files or content of all related CSS files?
Note - I want to build an engine for getting a lot of CSS files, this is why just reading the source is not enough..
Thanks,
You could try using http://simplehtmldom.sourceforge.net/ for HTML parsing.
require_once 'SimpleHtmlDom/simple_html_dom.php';
$url = 'www.website-to-scan.com';
$website = file_get_html($url);
// You might need to tweak the selector based on the website you are scanning
// Example: some websites don't set the rel attribute
// others might use less instead of css
//
// Some other options:
// link[href] - Any link with a href attribute (might get favicons and other resources but should catch all the css files)
// link[href="*.css*"] - Might miss files that aren't .css extension but return valid css (e.g.: .less, .php, etc)
// link[type="text/css"] - Might miss stylesheets without this attribute set
foreach ($website->find('link[rel="stylesheet"]') as $stylesheet)
{
$stylesheet_url = $stylesheet->href;
// Do something with the URL
}
You need to parse the HTML tags looking for CSS files. You can do it for example with preg_match - looking for matching regex.
Regex which would find such files might be like this:
\<link .+href="\..+css.+"\>
I need to create a php script.
The idea is very simple:
When I send a link of a blogpost to this php script, then the webpage is crawled and the first image with the title page are saved on my server.
What PHP function I have to use for this crawler ?
Use PHP Simple HTML DOM Parser
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all images
$images = array();
foreach($html->find('img') as $element) {
$images[] = $element->src;
}
Now $images array have images links of given webpage. Now you can store your desired image in database.
HTML Parser: HTMLSQL
Features: you can get external html file, http or ftp link and parse content.
Well, you'll have to use quite a few functions :)
But I'm going to assume that you're asking specifically about finding the image, and say that you should use a DOM parser like Simple HTML DOM Parser, then curl to grab the src of the first img element.
I would user file_get_contents() and a regular expression to extract the first image tags src attribute.
CURL or a HTML Parser seem overkill in this case, but you are welcome to check it out.
you guys ever saw that FB scrapes the link you post on facebook (status, message etc.) live right after you paste it in the link field and displays various metadata, a thumb of the image, various images from the a page link or a video thumb from a video related link (like youtube).
any ideas how one would copy this function? i'm thinking about a couple gearman workers or even better just javascript that does a xhr requests and parses the content based on regex's or something similar... any ideas? any links? did someone already tried to do the same and wrapped it in a nice class? anything? :)
thanks!
FB scrapes the meta tags from the HTML.
I.e. when you enter a URL, FB displays the page title, followed by the URL (truncated), and then the contents of the <meta name="description"> element.
As for the selection of thumbnails, I think maybe FB chooses only those that exceed certain dimensions, i.e. skipping over button graphics, 1px spacers, etc.
Edit: I don't know exactly what you're looking for, but here's a function in PHP for scraping the relevant data from pages.
This uses the simple HTML DOM library from http://simplehtmldom.sourceforge.net/
I've had a look at how FB does it, and it looks like the scraping is done at server side.
class ScrapedInfo
{
public $url;
public $title;
public $description;
public $imageUrls;
}
function scrapeUrl($url)
{
$info = new ScrapedInfo();
$info->url = $url;
$html = file_get_html($info->url);
//Grab the page title
$info->title = trim($html->find('title', 0)->plaintext);
//Grab the page description
foreach($html->find('meta') as $meta)
if ($meta->name == "description")
$info->description = trim($meta->content);
//Grab the image URLs
$imgArr = array();
foreach($html->find('img') as $element)
{
$rawUrl = $element->src;
//Turn any relative Urls into absolutes
if (substr($rawUrl,0,4)!="http")
$imgArr[] = $url.$rawUrl;
else
$imgArr[] = $rawUrl;
}
$info->imageUrls = $imgArr;
return $info;
}
Facebook looks at various meta information in the HTML of the page that you paste into a link field. The title and description are two obvious ones but a developer can also use <link rel="image_src" href="thumbnail.jpg" /> to provide a preferred screengrab. I guess you could check for these things. If this tag is missing you could always use a website thumbnail generation service.
As I am developing a project like that, it is not as easy as it seems, encoding issues, rendering content with javascript, existence of so many non-semantic websites are one of big problems I encountered. Especially extracting video info and trying to get auto-play behavior is always tricky or sometimes impossible. You can see a demo in http://www.embedify.me , it is written in .net but it has a service interface so you can call it via javascript, also there is javascript api to get the same ui/behavior as in fb.