Extracting specific information out of string - php

I have a string that contains an address to a youtube video, I want to use this to display the video in a pop-up lightbox. In the current form the link will not work in the lightbox:
http://www.youtube.com/v/CD2LRROpph0?f=videos&c=TEST&app=youtube_gdata&version=3
I has an idea to extract the video id 'CD2LRROpph0' and just append that to a regular youtube url, for example
http://www.youtube.com/watch?v=CD2LRROpph0.
Which i know works in the lightbox.
Any ideas on how to extract this code from the string???

This one will handle different protocols and different YouTube URLs (in case YouTube come out with country specific TLDs, for example).
$urlTokens = parse_url($url);
$newUrl = $urlTokens['scheme'] . '://' . $urlTokens['host'] . '/watch?v=' . preg_replace('~^/v/~', '', $urlTokens['path']);
CodePad.

$newUrl = preg_replace('#http://www.youtube.com/v/([a-z0-9_\-]{11}).*$#i',
'http://www.youtube.com/watch?v=$1', $string);

Or try this, you get an array back and can use the query you want
http://www.php.net/manual/en/function.parse-url.php

Related

How to delete tracking code from links in PHP

Hi I have a form in WordPress where users can submit a link to a product, but very often the links come with unnecessary baggage, like tracking codes. I would like to create a filter in WordPress and clean the links so they consist of just a working link. I would like to if possible confirm that the link still works or a method that will guarantee that the link will still work.
The main things I want to get rid of in links are utm_source and it's contents, utm_medium and it's contents, etc. Everything but the clean working link.
So for example, a link like this:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055&pdp=true&source=detail&utm_source=affiliate&utm_medium=affiliate&utm_campaign=pjdatafeed&publisherId=20648&clickId=2669312134#fo_c=745&fo_k=c0ebaf8359ca7853df8343e535533280&fo_s=pepperjam
Will end up like this:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055
I'd really appreciate if someone can lead me in the right direction.
Thanks!
You can do what you want with explode, parse_str and http_build_query. This code uses an array of unwanted parameters to decide what to delete from the query string:
$unwanted_params = array('utm_source', 'utm_medium', 'utm_campaign', 'clickId', 'publisherId', 'source', 'pdp', 'details', 'fo_k', 'fo_s');
$url = 'https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055&pdp=true&source=detail&utm_source=affiliate&utm_medium=affiliate&utm_campaign=pjdatafeed&publisherId=20648&clickId=2669312134#fo_c=745&fo_k=c0ebaf8359ca7853df8343e535533280&fo_s=pepperjam';
list($path, $query_string) = explode('?', $url, 2);
// parse the query string
parse_str($query_string, $params);
// delete unwanted parameters
foreach ($unwanted_params as $p) unset($params[$p]);
// rebuild the query
$query_string = http_build_query($params);
// reassemble the URL
$url = $path . '?' . $query_string;
echo $url;
Output:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055
Demo on 3v4l.org
You can do this in the PHP itself. There is a function called parse_url() (https://secure.php.net/manual/en/function.parse-url.php) which can give you all the URI params as array. After parsing, you can filter the parameters, remove the unwanted. Finally, use http_build_query() (https://secure.php.net/manual/en/function.http-build-query.php) to build a string URI to return :)

PHP: Properly convert addresses to clickeable links in string

I need to automatically parse a string and find if a link to my site is present, automatically replace the address by a clickeable HTML link.
Supposing my site adresses are www.mysite.com + wap.mysite.com + m.mysite.com, I need to convert:
My pictures at m.mysite.com/user/id are great.
to:
My pictures at mysite.com/user/id are great.
The question is how to do this (with ereg_replace?) instead of using tons of lines of code.
Notice that the result must be a relative URL, so that the current protocol and subdomain is used for the target link. If the user is in the m subdomain of the HTTPS version, the target will be the m subdomain of the HTTPS protocol and so on. Only links to mysite.com must be linked, any other links must be treated as ordinary plain text. Thanks in advance!
First piece of advice, stay away from ereg, it's been deprecated for a long time. Second, you can probably google and experiment to concoct a preg expression that works well for you, so tweak what I have here to suit your needs.
I was able to put together a fairly simple regex pattern to search for the URLs.
preg_match("/m.mysite.com\S+/", $str, $matches);
Once you have the URLs, I'd suggest parse_url instead of regex.
And here is the code
$sSampleInput = 'My pictures at http://m.mysite.com/user/id are great.';
// Search for URLs but don't look for the scheme, we'll add that later
preg_match("/m.mysite.com\S+/", $sSampleInput, $aMatches);
$aResults = array();
foreach($aMatches as $sUrl) {
// Tack a scheme afront the URL
$sUrl = 'http://' . $sUrl;
// Try validating the URL, requiring host & path
if(!filter_var(
$sUrl,
FILTER_VALIDATE_URL,
FILTER_FLAG_HOST_REQUIRED|FILTER_FLAG_PATH_REQUIRED)) {
trigger_error('Invalid URL: ' . $sUrl . PHP_EOL);
continue;
} else
$aResults[] =
'<a href="' . parse_url($sUrl, PHP_URL_PATH) .
'" target="_blank">' . $sUrl . '</a>';
}

Return id of video from metacafe using preg_match or something

Im trying to write plugin that will catch video details from various video websites simply by copy and paste url from browser.
But im having problems with metacafe videos.
To fetch details from example this video url
http://www.metacafe.com/watch/cb-V9EJSvnKvTcH/confronting_fear_in_virtual_reality/
I need to go to metacafe http://www.metacafe.com/api/item/cb-V9EJSvnKvTcH/ and parse xml from this video id.
I thought it would be easy but problem is metacafe use only video id for xml data and url contains also video name after id.
To embed only video i use
$video_grab_url = 'http://www.metacafe.com/watch/cb-V9EJSvnKvTcH/confronting_fear_in_virtual_reality/';
$embed_vid_url = parse_url( $video_grab_url );
$metacafe = mb_substr( $embed_vid_url['path'], 7, -1 );
With this i get video id with name for embed.
But as i said for other details i need to parse xml data from url and to get url of xml data i need only video id without name.
Im not php pro so i got little lost in using preg_match
How do i use preg_match to get only this "cb-V9EJSvnKvTcH" so i can pull xml data and parse info like duration, thumb, tags, etc...
Try this one. Maybe this can help you.
(?<=watch/).*?(?=/)
sample PHP code,
<?php
$subject = "theLINKhere";
$pattern = "#(?<=watch/).*?(?=/)#"; // edit: Added modifiers
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>
Or you can also use something like this:
$url = 'http://www.metacafe.com/watch/cb-V9EJSvnKvTcH/confronting_fear_in_virtual_reality/';
$path = parse_url($url, PHP_URL_PATH);
$pieces = explode('/', $path);
$video_id = $pieces[2];

Getting Actual links from Google News RSS with PHP

I want to parse Google News RSS with PHP, to get actual links of the content.
Google News RSS item link looks like this:
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGkF58EwDE7aA742GfVP9aE8azmhg&url=http://www.reuters.com/article/2012/01/15/us-obama-mlk-idUSTRE80E0PD20120115
I need just the actual link, everything after &url= :
http://www.reuters.com/article/2012/01/15/us-obama-mlk-idUSTRE80E0PD20120115
And how would one go about eliminating the "non-essential" part of the URL, in essence targeting everything starting with http://news.google.com and ending with &url= ?
http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGkF58EwDE7aA742GfVP9aE8azmhg&url=
I do a little regex, but this is out of my reach...
Thanks, fellas!
Regex is not necessarily the best approach here.
$query = parse_url($google_url, PHP_URL_QUERY);
parse_str($query, $parts);
$url = $parts['url'];
Here ya go:
$google_url = 'http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNGkF58EwDE7aA742GfVP9aE8azmhg&url=http://www.reuters.com/article/2012/01/15/us-obama-mlk-idUSTRE80E0PD20120115';
preg_match('/&url=([^&]+)/', $google_url, $matches);
$url = $matches[1];
echo $url;

Extracting site title from URL

I'm trying to find a away to extract a site title from a URL entered into a field in PHP. For example, if the user were to enter the URL http://www.nytimes.com/2009/11/05/sports/baseball/05series.html, I would want "New York Times" or "NY Times" or something along those lines.
I know it's fairly easy to extract the title of the WINDOW... for example, the URL I linked would have the title "Yankees 7, Phillies 3 - Back on Top....", but this is exactly what I don't want.
For clarification, this is for adding sources to a quote. I want to be able to add a source to quotes without a huge page URL and not just a link that says "Source".
Can anyone help me with this? Thanks in advance.
$source = parse_url('http://www.nytimes.com/....', PHP_URL_HOST); // www.nytimes.com
There is no such thing as a "site title" , you can get
the domain name (and then the owner name)
the page's title
I see you have the meta tag "cre" with the value "The New York Times" but you won't find it everywhere
You can do one thing : extract the domain name from the URL, and then get the first page's title
"http://www.nytimes.com/" will give you "The New York Times - Breaking News, World News & Multimedia"
Build a list of URL prefixes to site names, and check for each prefix in turn from longest to shortest.
You'd surely need a lookup table mapping domains (nytimes.com) to your titles "NY Times" in which case it would be easy to do.
If you want to have a method that will work on any link from any domain, then it is a bit harder as PHP in itself is not going to be able to work out what is a uniform title as it will vary from site to site.
You can explode the URL easily enough, but how then would you be able to dissect nytimes into "NY" and "TIMES".
You may be able to find a web service that allows you to feed in a domain and get back a site title, but I do not know of one.
You are best off simply quoting the domain, trimmed like "NYTIMES.COM" as the source, or "NYTIMES".
You would want to use file_get_contents() then run a match to check the text between any <title></title> tags - that then would be your title that you display.
Using parse_url wouldn't return the actual page title.
Something like:
<?php
$x = file_get_contents("http://google.com");
preg_match("/<title>(.+?)<\/title>/", $x, $match);
echo $match[1];
?>
Use the Simple HTML DOM Parser. Here is an example:
require "simple_html_dom.php";
$url = "http://www.google.com";
$html = file_get_html( $url );
list( $title ) = $html->find( 'title' );
echo strip_tags( $title ); // Output: "Google"

Categories