How to retrieve website names?

How to retrieve website names? - php

I am wondering if there is a way to get the name of a website from a URL. I know you can parse a URL to get a domain name, but since site names are not standardized as far as code is concerned, I am doubtful.
An example of how this could be used is say I am linking to a New York Times article. I can have the title of the article link to the article page. Then I might want to have the source, "The New York Times" displayed next to the title of the article. It would be exceedingly convenient if I could have this automatically generated.
Just getting the page title wouldn't work because that would usually give you the article title or, if you were to link to some other type of page, you might get some string like "How to retrieve website names? - Stack Overflow." I would only want to get the "Stack Overflow" part of that.
Any ideas?

You could try the application-name property:
<meta name="application-name" content="The New York Times" />
also
<meta name="application-name" content="CNN"/>
Not every site will have this but you can start here, check for open graph tags (http://ogp.me), etc.

You will need to parse the DOM tree using DOMDocument:
<?php
function GetTitle($url)
{
$dom = new DOMDocument;
#$dom->loadHTMLFile($url); // # supresses warnings
// try to get meta application-name
foreach ($dom->getElementsByTagName("meta") as $meta)
{
$metaName = $meta->attributes->getNamedItem("name");
if (strtolower($metaName->nodeValue) == "application-name")
{
$metaContent = $meta->attributes->getNamedItem("content");
if ($metaContent != NULL)
return $metaContent->nodeValue;
}
}
// title fallback:
foreach ($dom->getElementsByTagName("title") as $title)
return $title->nodeValue;
return NULL;
}
print(GetTitle("http://www.nytimes.com/"));
?>
First, GetTitle() looks for a <meta name="application-name"> tag. If not found, it will fallback and return the page title instead.
Additionally, you should pass the base url. F.e. if you have this url: http://stackoverflow.com/questions/16185145/how-to-retrieve-website-names/16185654#16185654, you should strip everything except http://stackoverflow.com using parse_url:
$parsedUrl = parse_url($url);
GetTitle($parsedUrl["scheme"] + "://" + $parsedUrl["host"]);

If you want to parse the url, you could try this function:
$parsedUrl = parse_url($url);
$host = $parsedUrl['host']
echo $host;
This will give you an associative array where the host key is what you are looking for.
See: http://php.net/manual/en/function.parse-url.php

What you call 'Site name' is not a part of the link, it is part of the HTML code returned by that link.
If you want to get the site title, you should retrieve the link content using CURL and then parse the returned HTML to get the content of the tag in the section.
Probably this will be expensier than the benefit you could get.

Related

Get attributes from post data in PHP

I'm using laravel to create a simple social network. Users can type # in the post area to get a list of their friends to mention them. Every mention in a link like this (using zurb/tribute from github)
<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>
Normal links other than mentions won't have type='mention'
Now when I get the post and insert it into the database I need to get a list of users mentioned in the post. I'm looking for links which have the type ='mention' and if there's any I want to get the title of everyone to insert into the notification system. What PHP code do I need to add in this if statement?
if(stristr(request('post'),' type="mention" ')){
}

Aside from using an AST (Abstract Syntax Tree), your best bet would be to either use DOM on PHP Side, e.g.:
$string = '<a type="mention" href="/user/Jordan" title="Jordan">Jordan</a>';
$doc = new \DOMDocument();
$doc->loadHTML($string);
$xpath = new \DOMXPath($doc);
foreach ($xpath->query("//a[#type='mention']") as $a) {
$href = $a->getAttribute('href');
$title = $a->getAttribute('title');
echo sprintf("Found mention of %s with href of %s\n", $title, $href);
}
However, I probably wouldn't be sending the A node back to the server. You should consider working out some way to make it a display-only feature implemented on the browser side, and simply send the "#jordan" string back to the server.

Using a variable for path in drupal_get_normal_path

I am new to drupal coding and still fairly new to PHP. I have gotten myself to a certain point, and am now stuck! The documentation has helped me a lot up to this point, but I find myself struggling to make it over this hurdle.
My Code:
<?php
//Pulls the refering page url
$prev_page = $_SERVER['HTTP_REFERER'];
//breaks the referer into an associated array
$delimit = '/';
$splode = explode($delimit,$prev_page);
$chunked = array_slice($splode, 3, NULL);
//iterates through the array to output the address as a string
foreach($chunked as $k=>$v){
$path .= $v."/";
}
//find the node id from the alias
$node = drupal_get_normal_path($path);
echo $node;
?>
So I have gotten the refering page address to be just the extension (ie: about-us/tim rather than http://www.google.com/about-us/tim ) to pass into drupal_get_normal_path.
I have put the actual uri into the drupal_get_normal_path and received the node information that I had expected to get, but when I use the variable as shown in the code block above it returns the text that is stored in the variable instead of finding the node source.
Any help ya'll can give is greatly appreciated!

Think this is fairly similar to this question here.
What you're doing wrong is assuming that that function returns a node - it doesn't, just returns the internal path to that node. So you still have to get the object (the node) referenced by that URL and then you actually have the node id.
Basically, you can achieve it using (this code is slightly more efficient and compact than what you have - plus it works!):
$url = $_SERVER['HTTP_REFERER'];
$path = preg_replace('/\//','',parse_url($url,PHP_URL_PATH),1);
$org_path = drupal_lookup_path("source", $path);
$node = menu_get_object("node", 1, $org_path);
$nid=$node->nid;
print $nid;
If you actually want to load the node, you just do node_load($nid) after all this.
Hope this helps!

PHP-Retrieve specific content from multiple pages of a website

What I want to accomplish might be a little hardcore, but I want to know if it's possible:
The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.
The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).
The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show?
(http://bierdopje.com/shows/*).
The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).

use this code to fetch only the links from the single website.
include_once('simple_html_dom.php');
$html = file_get_html('http://www.couponrani.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';

I use phpquery to fetch data from a web page, like jQuery in Dom.
For example, to get the list of all shows, you can do this :
<?php
require_once 'phpQuery/phpQuery/phpQuery.php';
$doc = phpQuery::newDocumentHTML(
file_get_contents('http://www.bierdopje.com/shows')
);
foreach (pq('.listing a') as $key => $a) {
$url = pq($a)->attr('href'); // will give "/shows/07-ghost"
$show = pq($a)->text(); // will give "07 Ghost"
}
Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.
Get the status of a show
$html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
$doc = phpQuery::newDocumentHTML($html);
$status = pq('.content>span:nth-child(6)')->text();

Changing base URL on part of a page only

I have a page on my site that fetches and displays news items from the database of another (legacy) site on the same server. Some of the items contain relative links that should be fixed so that they direct to the external site instead of causing 404 errors on the main site.
I first considered using the <base> tag on the fetched news items, but this changes the base URL of the whole page, breaking the relative links in the main navigation - and it feels pretty hackish too.
I'm currently thinking of creating a regex to find the relative URLs (they all start with /index.php?) and prepending them with the desired base URL. Are there any more elegant solutions to this? The site is built on Symfony 2 and uses jQuery.

Here is how I would tackle the problem:
function prepend_url ($prefix, $path) {
// Prepend $prefix to $path if $path is not a full URL
$parts = parse_url($path);
return empty($parts['scheme']) ? rtrim($prefix, '/').'/'.ltrim($path, '/') : $path;
}
// The URL scheme and domain name of the other site
$otherDomain = 'http://othersite.tld';
// Create a DOM object
$dom = new DOMDocument('1.0');
$dom->loadHTML($inHtml); // $inHtml is an HTML string obtained from the database
// Create an XPath object
$xpath = new DOMXPath($dom);
// Find candidate nodes
$nodesToInspect = $xpath->query('//*[#src or #href]');
// Loop candidate nodes and update attributes
foreach ($nodesToInspect as $node) {
if ($node->hasAttribute('src')) {
$node->setAttribute('src', prepend_url($otherDomain, $node->getAttribute('src')));
}
if ($node->hasAttribute('href')) {
$node->setAttribute('href', prepend_url($otherDomain, $node->getAttribute('href')));
}
}
// Find all nodes to export
$nodesToExport = $xpath->query('/html/body/*');
// Iterate and stringify them
$outHtml = '';
foreach ($nodesToExport as $node) {
$outHtml .= $node->C14N();
}
// $outHtml now contains the "fixed" HTML as a string
See it working

You can override the base tag by putting http:\\ in front of the link. That is, give a full url, not a relative URL.

Well, not actually a solution, but mostly a tip...
You could start playing aroung with ExceptionController.
There, just for example, you could seek for 404 error and check query string appended to request:
$request = $this->container->get('request');
....
if (404 === $exception->getStatusCode()) {
$query = $request->server->get('QUERY_STRING');
//...handle your logic
}
The other solution would be to define special route with its controller for such purposes, which would catch requests to index.php and do redirects and so on. Just define index.php in requirements of route and move this route on the top of your routing.
Not a clearest answer ever, but at least I hope I gave you a direction...
Cheers ;)

DOM to parse Facebook wall

I am trying to parse messages from a public Facebook fan page wall, but it returns a blank page.
$source = "http://www.facebook.com/?sk=wall&filter=2";
libxml_use_internal_errors(TRUE);
$dom = new DOMDocument();
$dom->loadHTML($source);
$xml = simplexml_import_dom($dom);
libxml_use_internal_errors(FALSE);
$message = $xml->xpath("//span[#class='messageBody']");
return (string)$message[0] . PHP_EOL;

The DOMDocument::loadHTML() method, which you are using, expects the HTML content as a parameter -- and not an URL.
Here, you are trying to interpret your URL as some HTML content -- and not what it links to.
using this method, you might want to try with one that works on a file, or a remonte content, such as DOMDocument::loadHTMLFile().

This is not the right way to fetch data from Facebook, and it's clear that you want to avoid creating a Facebook Application.
But the good news is that you can still use the FQL, try the below query in the Graph API Explorer.
In the below query, we queried the stream table to get the Facebook Developers page's public feeds:
SELECT message
FROM stream
WHERE source_id=19292868552
AND is_hidden = 0
AND filter_key='owner'
It'll return all the "public" feeds of the page. Obviously you may need retrieve more fields to create a meaningful result.
You need to provide a valid access_token to even access public posts. Read more here.

Yet another approach would be to use the JSON from the Graph API
$posts = json_decode(
file_get_contents('https://graph.facebook.com/swagbucks/posts')
);
foreach($posts->data as $post) {
echo $post->message, PHP_EOL;
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to retrieve website names? - php

You could try the application-name property: <meta name="application-name" content="The New York Times" /> also <meta name="application-name" content="CNN"/> Not every site will have this but you can start here, check for open graph tags (http://ogp.me), etc.

If you want to parse the url, you could try this function: $parsedUrl = parse_url($url); $host = $parsedUrl['host'] echo $host; This will give you an associative array where the host key is what you are looking for. See: http://php.net/manual/en/function.parse-url.php

Related

Get attributes from post data in PHP

Using a variable for path in drupal_get_normal_path

PHP-Retrieve specific content from multiple pages of a website

Changing base URL on part of a page only

DOM to parse Facebook wall

Categories

Resources