Replace Specifc Full Links Between href=" " Using PHP - php

I have tried searching through related answers but can't quite find something that is suitable for my specific needs. I have quite a few affiliate links within 1,000s of articles on one of my wordpress sites - which all start with the same url format and sub-domain structure:
http://affiliateprogram.affiliates.com/
However, after the initial url format, the query string appended changes for each individual url in order to send visitors to specific pages on the destination site.
I am looking for something that will scan a string of html code (the article body) for all href links that include the specific domain above and then replace THE WHOLE LINK (whatever the query string appended) with another standard link of my choice.
href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination"
gets replaced with
href="http://www.mylink.com"
I would ideally like to do this via php as I have a basic grasp, but if you have any other suggestions I would appreciate all input.
Thanks in advance.

<?php
$html = 'href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination"';
echo preg_replace('#http://affiliateprogram.affiliates.com/([^"]+)#is', 'http://www.mylink.com', $html);
?>
http://ideone.com/qaEEM

Use a regular expression such as:
href="(https?:\/\/affiliateprogram.affiliates.com\/[^"]*)"
$data =<<<EOT
bar
foo
<a name="zz" href="http://affiliateprogram.affiliates.com/?query=random&page=destination&string">baz</a>
EOT;
echo (
preg_replace (
'#href="(https?://affiliateprogram.affiliates.com/[^"]*)"#i',
'href="http://www.mylink.com"',
$data
)
);
output
bar
foo
<a name="zz" href="http://www.mylink.com">baz</a>

$a = '<a class="***" href="http://affiliateprogram.affiliates.com/?random=query_string&page=destination" attr="***">';
$b = preg_replace("/<a([^>]*)href=\"http:\/\/affiliateprogram\.affiliates\.com\/[^\"]*\"([^>]*)>/", "<a\\1href=\"http://www.mylink.com/\"\\2>", $a);
var_dump($b); // <a class="***" href="http://www.mylink.com/" attr="***">

That's quite simple, as you only need a single placeholder for the querystring. .*? would normally do, but you can make it more specific by matching anything that's not a double quote:
$html =
preg_replace('~ href="http://affiliateprogram\.affiliates\.com/[^"]*"~i',
' href="http://www.mylink.com"', $html);
People will probably come around and recomend a longwinded domdocument approach, but that's likely overkill for such a task.

Related

Dealing with online newspaper headline link in PHP

I have seen on most online newspaper websites that when i click on a headline link, e.g. two thieves caught red handed, it normally opens a url like this: www.example.co.uk/news/two-thieves-caught-red-handed.
How do I deal with this url in php code, so that I can only pick the last part in the url. e.g. two-thieves-caught-red-handed. After that I want to work with this string.
I know how to deal with GET parameters like "www.example.co.uk/news/headline=two thieves caught red handed".
But I do not want to do it that way. Could you show me another way.
You can use the combination of explode and end functions for that
for example:
<?php
$url = "www.example.co.uk/news/two-thieves-caught-red-handed";
$url = explode('/', $url);
$end = end($url);
echo "$end";
?>
The code will result
two-thieves-caught-red-handed
You have several options in php to get the current url. For a detailed overview look here.
One would be to use $_SERVER[REQUEST_URI] and the use a string manipulation function for extraction of the parts you need.
Maybe this thread will help you too.

Preg_Replace Change URL

I am trying to grab content from another one of my site which is working fine, apart from all the links are incorrect.
include_once('../simple_html_dom.php');
$page = file_get_html('http://www.website.com');
$ret = $page->find('div[id=header]');
echo $ret[0];
Is there anyway instead of all links showing link to have the full link? using preg replace.
$ret[0] = preg_replace('#(http://([\w-.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)#',
'http://fullwebsitellink.com$1', $ret[0]);
I guess it would be something like above but I dont understand?
Thanks
Your question doesn't really explain what is "incorrect" about the links, but I'm guessing you have something like this:
<div id="header">Home | Sitemap</div>
and you want to embed it in another site, where those links need to be fully-qualified with a domain name, like this:
<div id="header">Home | Sitemap</div>
Assuming this is the case, the replacement you want is so simple you don't even need a regex: find all href attributes beginning "/", and add the domain part (I'll use "http://example.com") to their beginning to make them absolute:
$scraped_html = str_replace('href="/', 'href="http://example.com/', $scraped_html);

Extract keywords from referrer URL

It seems Google's URLs are structured differently these days. So it is harder to extract the referring keyword from them. Here is an example:
http://www.google.co.uk/search?q=jquery+post+output+46&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#pq=jquery+post+output+46&hl=en&cp=30&gs_id=1v&xhr=t&q=jquery+post+output+php+not+running&pf=p&sclient=psy-ab&client=firefox-a&hs=8N5&rls=org.mozilla:en-US%3Aofficial&source=hp&pbx=1&oq=jquery+post+output+php+not+run&aq=0w&aqi=q-w1&aql=&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=bdeb326aa44b07c5&biw=1280&bih=875
The search I performed was actually "jquery post output php not running", so the first 'q=' does not contain the full search. The second one does. I'd like to write a script that always extracts the last 'q=', but I'm not sure if Google's URL's always have the full search last. Anyone had any experience with this.
You can accomplish this using parse_url(), parse_str(), and urldecode(), where $str is the refer string:
$fragment = parse_url($str, PHP_URL_FRAGMENT);
parse_str($fragment, $arr);
$query = urldecode($arr['q']); // jquery post output php not running

Rewrite Youtube URL

I have some YouTube URLs stored in a database that I need to rewrite.
They are stored in this format:
$http://youtu.be/IkZuQ-aTIs0
I need to have them re-written to look like this:
$http://youtube.com/v/IkZuQ-aTIs0
These values are stored as a variable $VideoType
I'm calling the variable like this:
$<?php if ($video['VideoType']){
$echo "<a rel=\"shadowbox;width=700;height=400;player=swf\" href=\"" . $video['VideoType'] . "\">View Video</a>";
$}?>
How do I rewrite them?
Thank you for the help.
You want to use the preg_replace function:
Something like:
$oldurl = 'youtu.be/blah';
$pattern = '/youtu.be/';
$replacement = 'youtube.com/v';
$newurl = preg_replace($pattern, $replacement, $string);
You can use a regular expression to do this for you. If you have ONLY youtube URLs stored in your database, then it would be sufficient to take the part after the last slash 'IkZuQaTIs0' and place it in the src attribute after 'http://www.youtube.com/'.
For this simple solution, do something like this:
<?php
if ($video['VideoType']) {
$last_slash_position = strrpos($video['VideoType'], "/");
$youtube_url_code = substr($video['VideoType'], $last_slash_position);
echo "<a rel=\"shadowbox;width=700;height=400;player=swf\"
href=\"http://www.youtube.com/".$youtube_url_code."\">
View Video</a>";
}
?>
I cannot test it at the moment, maybe you can try to experiment with the position of the last slash occurence etc. You can also have a look at the function definitions:
http://www.php.net/manual/en/function.substr.php
http://www.php.net/manual/en/function.strrpos.php
However, be aware of the performance. Build a script which prases your database and converts every URL or stores a short and a long URL in each entry. Because regular expressions in the view are never a good idea.
UPDATE: it would be even better to store ONLY the youtube video identifier / url code in the database for every entry, so in the example's case it would be IkZuQ-aTIs0.

Extracting site title from URL

I'm trying to find a away to extract a site title from a URL entered into a field in PHP. For example, if the user were to enter the URL http://www.nytimes.com/2009/11/05/sports/baseball/05series.html, I would want "New York Times" or "NY Times" or something along those lines.
I know it's fairly easy to extract the title of the WINDOW... for example, the URL I linked would have the title "Yankees 7, Phillies 3 - Back on Top....", but this is exactly what I don't want.
For clarification, this is for adding sources to a quote. I want to be able to add a source to quotes without a huge page URL and not just a link that says "Source".
Can anyone help me with this? Thanks in advance.
$source = parse_url('http://www.nytimes.com/....', PHP_URL_HOST); // www.nytimes.com
There is no such thing as a "site title" , you can get
the domain name (and then the owner name)
the page's title
I see you have the meta tag "cre" with the value "The New York Times" but you won't find it everywhere
You can do one thing : extract the domain name from the URL, and then get the first page's title
"http://www.nytimes.com/" will give you "The New York Times - Breaking News, World News & Multimedia"
Build a list of URL prefixes to site names, and check for each prefix in turn from longest to shortest.
You'd surely need a lookup table mapping domains (nytimes.com) to your titles "NY Times" in which case it would be easy to do.
If you want to have a method that will work on any link from any domain, then it is a bit harder as PHP in itself is not going to be able to work out what is a uniform title as it will vary from site to site.
You can explode the URL easily enough, but how then would you be able to dissect nytimes into "NY" and "TIMES".
You may be able to find a web service that allows you to feed in a domain and get back a site title, but I do not know of one.
You are best off simply quoting the domain, trimmed like "NYTIMES.COM" as the source, or "NYTIMES".
You would want to use file_get_contents() then run a match to check the text between any <title></title> tags - that then would be your title that you display.
Using parse_url wouldn't return the actual page title.
Something like:
<?php
$x = file_get_contents("http://google.com");
preg_match("/<title>(.+?)<\/title>/", $x, $match);
echo $match[1];
?>
Use the Simple HTML DOM Parser. Here is an example:
require "simple_html_dom.php";
$url = "http://www.google.com";
$html = file_get_html( $url );
list( $title ) = $html->find( 'title' );
echo strip_tags( $title ); // Output: "Google"

Categories