Checking up on a link exchange - php

I have made a link exchange with another site. 3 days later, the site has removed my link.
Is there a simple php script to help me control link exchanges and notify me if my link has been removed?
I need it as simple as possible, and not a whole ad. system manager.

If you know the URL of webpage where your ad(link) exists then you can use Simple HTML DOM Parser to get all links of that webpage in array and then use php in_array function to check that your link exists in that array or not. You can run this script on daily bases using crontab.
// Create DOM from URL
$html = file_get_html('http://www.example.com/');
// Find all links
$allLinks = array();
foreach($html->find('a') as $element) {
$allLinks[] = $element->href;
}
// Check your link.
$adLink = "http://www.mylink.com";
if ( in_array($adLink , $allLinks ) ) {
echo "My link exists.";
} else {
echo "My link is removed.";
}

Technically there's no way to know if someone's website has a link to yours unless you have traffic directed from their website or you look at their website.
Your best bet would be either:
A script which records every time they link to your image. This is simple enough by mixing PHP and .htaccess
.htaccess:
RewriteRule path/to/myImage.jpg path/to/myScript.php
myScript.php:
/* Record (database, file, or however) that they accessed */
header("Content-type: image/jpeg");
echo file_get_contents("path/to/myImage.jpg");
Or a script which looks at their website every X amount of minutes/hours/days and searches the returned HTML for the link to your image. The challenge here is making the script run periodically. This can be done with crontab or similar
myScript.php:
$html = file_get_contents("http://www.theirsite.com");
if(strpos($html, 'path/to/myImage.jpg') !== FALSE)
/* Happiness */
else
/* ALERT! */

Related

PHP Fast scraping

My goal is to collect headtitles from different news outlets and then echo them on my page. I've tried using Simple HTML DOM, and then run an IF statement to check for keywords. It works, but it is very slow! The code is to be found bellow. Is there a better way to go about this, and if so; how would it be written?
Thanks in advance.
<?php
require 'simple_html_dom.php';
// URL and keyword
$syds = file_get_html('http://www.sydsvenskan.se/nyhetsdygnet');
$syds_key = 'a.newsday__title';
// Debug
$i = 0;
// Checking for keyword "A" in the headtitles
foreach($syds->find($syds_key) as $element) {
if (strpos($element, 'a') !== false || strpos($element, 'A') !== false) {
echo $element->href . '<br>';
$i++;
}
}
echo "<h1>$i were found</h1>";
?>
How slow are we talking?
1-2 seconds would be pretty good.
If your using this for a website.
I'd advise splitting the crawling and the display into 2 separate scripts, and cache the results of each crawl.
You could:
have a crawl.php file that runs periodically to update your links.
then have a webpage.php that reads the results of the last crawl and displays it however you need for your website.
This way:
Every time you refresh your webpage, it doesn't re-request info from the news site.
It's less important that the news site takes a little long to respond.
Decouple crawling/display
You will want to decouple, crawling and display 100%.
Have a "crawler.php" than runs over all the news sites one at a time saving the raw links to a file. This can run every 5-10 minutes to keep the news updated, be warned less than 1 minute and some news sites may get annoyed!
crawler.php
<?php
// Run this file from cli every 5-10 minutes
// doesn't matter if it takes 20-30 seconds
require 'simple_html_dom.php';
$html_output = ""; // use this to build up html output
$sites = array(
array('http://www.sydsvenskan.se/nyhetsdygnet', 'a.newsday__title')
/* more sites go here, like this */
// array('URL', 'KEY')
);
// loop over each site
foreach ($sites as $site){
$url = $site[0];
$key = $site[1];
// fetch site
$syds = file_get_html($url);
// loop over each link
foreach($syds->find($key) as $element) {
// add link to $html_output
$html_output .= $element->href . '<br>\n';
}
}
// save $html_output to a local file
file_put_contents("links.php", $html_output);
?>
display.php
/* other display stuff here */
<?php
// include the file of links
include("links.php");
?>
Still want faster?
If you wan't any faster, I'd suggest looking into node.js, its much faster at tcp connections and html parsing.
The bottlenecks are:
blocking IO - you can switch to an asynchronous library like guzzle
parsing - you can switch to a different parser for better parsing speed

How to handle ?_escaped_fragment_= for AJAX crawlers?

I'm struggling to make AJAX-based website SEO-friendly. As recommended in tutorials on the web, I've added "pretty" href attributes to links: контакт and, in a div where content is loaded with AJAX by default, a PHP script for crawlers:
$files = glob('./pages/*.php');
foreach ($files as &$file) {
$file = substr($file, 8, -4);
}
if (isset($_GET['site'])) {
if (in_array($_GET['site'], $files)) {
include ("./pages/".$_GET['site'].".php");
}
}
I have a feeling that at the beginning I need to additionaly cut the _escaped_fragment_= part from (...)/index.php?_escaped_fragment_=site=about because otherwise the script won't be able to GET the site value from URL , am I right?
but, anyway, how do I know that the crawler transforms pretty links (those with #!) to ugly links (containing ?_escaped_fragment_=)? I've been told that it happens automatically and I don't need to provide this mapping, but Fetch as Googlebot doesn't provide me with any information about what happens to URL.
Google bot will automatically query for ?_escaped_fragment_= urls.
So from www.example.com/index.php#!site=about
Google bot will query: www.example.com/index.php?_escaped_fragment_=site=about
On PHP site you will get it as $_GET['_escaped_fragment_'] = "site=about"
If you want to get the value of the "site" you need to do something like this:
if(isset($_GET['_escaped_fragment_'])){
$escaped = explode("=", $_GET['_escaped_fragment_']);
if(isset($escaped[1]) && in_array($escaped[1], $files)){
include ("./pages/".$escaped[1].".php");
}
}
Take a look at the documentation:
https://developers.google.com/webmasters/ajax-crawling/docs/specification

using href to two links(php and a webpage) [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
While($enreg=mysql_fetch_array($res))
{
$link_d.="<font color=\"red\">clic here to download</font></td>"
}
i want to use the href so it leads to download link, also to send the id to a php file so i can get how many times the files have been downloaded !
How can we use href to multiple links !
You can't. A link can only point to one resource.
Instead, what you should do is have your PHP script redirect to the file. The link points at your PHP script with the counter, and then set a Location: header (which automatically sets a 302 status code for redirection) with the value being the URL you want to redirect to.
Also, you should really use htmlspecialchars() around any variable data you use in an HTML context, to ensure you are generating valid HTML.
Ideally you would have some checks to see if it's a human downloading (Web crawlers may trigger it - we will put no-follow in the link which will help though). You could also use a database but that gets more complicated. My preferred way would be to use Google Analytics Events. But here is a simple PHP script that might fulfill your needs without the complexity of the other solutions.
First modify your links to have a tracker script and to urlencode
$link_d.= '<a style="color:red" href="tracker.php?url='.urlencode($enreg[link]).'" target="_blank">click here to download</a>';
}
Then create a script that will record downloads (tracker.php)
<?php
// keep stats in a file - you can change the path to have it be below server root
// or just use a secret name - must be writeable by server
$statsfile = 'stats.txt';
// only do something if there is a url
if(isset($_GET['url'])) {
//decode it
$url = urldecode($_GET['url']);
// Do whatever check you want here to see if it's a valud link - you can add a regex for a URL for example
if(strpos($url, 'http') != -1) {
// load current data into an array
$lines = file($statsfile);
// parse array into something useable by php
$stats = array();
foreach($lines as $line) {
$bits = explode('|', $line);
$stats[(string)$bits[0]] = (int)$bits[1];
}
// see if there is an entry already
if(!isset($stats[$url])) {
// no so let's add it with a count of 1
$stats[$url] = 1;
} else {
// yes let's increment
$stats[$url]++;
}
// get a blank string to write to file
$data = null;
// get our array into some human readabke format
foreach($stats as $url => $count) {
$data .= $url.'|'.$count."\n";
}
// and write to file
file_put_contents($statsfile, $data);
}
// now redirect to file
header('Location: ' . $url);
}
You can't.
Anchor are meant to lead to one ressource.
What you want to do is tipically addressed by using an intermediate script that count the hit and redirect to the ressource.
eg.
Click here to download
redirect.php
// Increment for example, a database :
// UPDATE downloads SET hits = (hits + 1) WHERE id=42
// Get the URI
// SELECT uri FROM downloads WHERE id=42
// Redirect to the URI
// (You may also need to set Content-type header for file downloads)
header( "Location: $uri" );
You may optimize this by passing the uri as a second parameter so that you won't need to fetch it at redirect time.
Click here to download
Another way of collecting this kind of statistics is to use some javascript tools provided by your statistics provider, like Google Analytics or Piwik, adding a listener to the click event.
It is less invasive for your base code but won't let you easily reuse collected data in your site (for example if you want to show a "top download" list).
Create a file with download script for example download.php and route all your downloads through it. Update your counter in this page and use appropriate headers for download.
eg:
url may be download.php?id=1&file=yourfile
in download.php
//get id and file
//database operation to update your count
//headers for download

PHP-Retrieve specific content from multiple pages of a website

What I want to accomplish might be a little hardcore, but I want to know if it's possible:
The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.
The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).
The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show?
(http://bierdopje.com/shows/*).
The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).
use this code to fetch only the links from the single website.
include_once('simple_html_dom.php');
$html = file_get_html('http://www.couponrani.com/');
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
I use phpquery to fetch data from a web page, like jQuery in Dom.
For example, to get the list of all shows, you can do this :
<?php
require_once 'phpQuery/phpQuery/phpQuery.php';
$doc = phpQuery::newDocumentHTML(
file_get_contents('http://www.bierdopje.com/shows')
);
foreach (pq('.listing a') as $key => $a) {
$url = pq($a)->attr('href'); // will give "/shows/07-ghost"
$show = pq($a)->text(); // will give "07 Ghost"
}
Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.
Get the status of a show
$html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
$doc = phpQuery::newDocumentHTML($html);
$status = pq('.content>span:nth-child(6)')->text();

how does youtube lock direct links to the browser?

Recently youtube changed the way direct video download links work (found in url_encoded_fmt_stream_map), there is a signature now and links don't work unless the right signature is presented
the signature is there as a 'sig' argument so you can easy take it and construct the link and it will work, however ever since this signature appeared the link is also locked to the user's browser somehow
meaning if I probe "http://youtube.com/get_video_info" on the server side and construct the links with the signature and then print that as a link when the user clicks the link a black page will open, however if I try to download the video on the server side it will work.
This means that the link is somehow locked and belongs to the user who opened "http://youtube.com/get_video_info"
The problem with this situation is that in order to stream the videos you have to first download them on your server
Does anyone know how are the links locked to specific user and is there a way around it?
The idea is for example - you get the link on the server side and then you feed it to some flash player, instead of using the chromeless player
here is a code example with php:
<?
$video_id = $_GET['id']; //youtube video id
// geting the video info
$content = file_get_contents("http://youtube.com/get_video_info?video_id=".$video_id);
parse_str($content, $ytarr);
// getting the links
$links = explode(',',$ytarr['url_encoded_fmt_stream_map']);
// formats you would like to use
$formats = array(35,34,6,5);
//loop trough the links to find the one you need
foreach($links as $link){
parse_str($link, $args);
if(in_array($args['itag'],$formats)){
//right link found since the links are in hi-to-low quality order
//the match will be the one with highest quality
$video_url = $args['url'];
// add signature to the link
if($args['sig']){
$video_url .= '&signature='.$args['sig'];
}
/*
* What follows is three ways of proceeding with the link,
* note they are not supposed to work all together but one at a time
*/
//download the video and output to browser
#readfile($video_url); // this works fine
exit;
//show video as link
echo 'link for '.$args['itag'].''; //this won't work
exit;
//redirect to video
header("Location: $video_url"); // this won't work
exit;
}
}
?>

Categories