extracting facebook photo id from a LONG url - php

I have searched this website on how to extract facebook id from url that starts from photo.php?fbid= but i have a long url and know how to get photo id
Example1 : photo.php?fbid=10151987845617397 (the complete url is stored in the $url variable which is checked using preg_match i believe)
!preg_match("|^http(s)?://(www.)?facebook.com/photo.php(.*)?$|i", $url) || !$pid
the above code fetches facebook id 10151987845617397 and puts it in the variable $pid.
If I have a long url, how can i change the code?
Here is the url
Example2 : https://www.facebook.com/nokia/photos/a.338008237396.161268.36922302396/10151987845617397/?type=1&theater
In the above url 10151987845617397 is the photo id that i need to capture and put it in variable $pid.
what changes do i need to do in the preg_match string?
In other words to get the photoid 10151987845617397 as output in the $pid variable:
For url facebookcom/photo.php?fbid=10151987845617397
The syntax is !preg_match("|^http(s)?://(www.)?facebook.com/photo.php(.*)?$|i", $url) || !$pid
So for url facebookcom/nokia/photos/a.338008237396.161268.36922302396/10151987845617397/?type=1&theater
What would be the syntax
Please help
Thanks

The simple solution and quite readable: Use the entire string as a regex, use () around what you want to match:
// $tmp[1] = www or nothing
// $tmp[2] = "user" (i.e nokia)
// $tmp[3] = album id?
// $tmp[4] = photos
// $tmp[5] = Long url as requested
function extract_id_from_album_url($url) {
preg_match('/https?:\/\/(www.)?facebook\.com\/([a-zA-Z0-9_\- ]*)\/([a-zA-Z0-9_\- ]*)\/([a-zA-Z0-9_\.\-]*)\/([a-zA-Z0-9_\-]*)(\/\?type=1&theater\/)?/i', $url, $tmp);
return isset($tmp[5]) ? $tmp[5] : false;
}
Backslashes are needed to ensure the . is seen as a literal (and not regex syntax). Questionmarks to allow optional urls. Using more regex syntax can make the matching "query" much shorter and extendable, but also makes it harder to read.

Related

How to delete tracking code from links in PHP

Hi I have a form in WordPress where users can submit a link to a product, but very often the links come with unnecessary baggage, like tracking codes. I would like to create a filter in WordPress and clean the links so they consist of just a working link. I would like to if possible confirm that the link still works or a method that will guarantee that the link will still work.
The main things I want to get rid of in links are utm_source and it's contents, utm_medium and it's contents, etc. Everything but the clean working link.
So for example, a link like this:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055&pdp=true&source=detail&utm_source=affiliate&utm_medium=affiliate&utm_campaign=pjdatafeed&publisherId=20648&clickId=2669312134#fo_c=745&fo_k=c0ebaf8359ca7853df8343e535533280&fo_s=pepperjam
Will end up like this:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055
I'd really appreciate if someone can lead me in the right direction.
Thanks!
You can do what you want with explode, parse_str and http_build_query. This code uses an array of unwanted parameters to decide what to delete from the query string:
$unwanted_params = array('utm_source', 'utm_medium', 'utm_campaign', 'clickId', 'publisherId', 'source', 'pdp', 'details', 'fo_k', 'fo_s');
$url = 'https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055&pdp=true&source=detail&utm_source=affiliate&utm_medium=affiliate&utm_campaign=pjdatafeed&publisherId=20648&clickId=2669312134#fo_c=745&fo_k=c0ebaf8359ca7853df8343e535533280&fo_s=pepperjam';
list($path, $query_string) = explode('?', $url, 2);
// parse the query string
parse_str($query_string, $params);
// delete unwanted parameters
foreach ($unwanted_params as $p) unset($params[$p]);
// rebuild the query
$query_string = http_build_query($params);
// reassemble the URL
$url = $path . '?' . $query_string;
echo $url;
Output:
https://www.serenaandlily.com/variationproduct?dwvar_m10055_size=Twin&dwvar_m10055_color=Chambray&pid=m10055
Demo on 3v4l.org
You can do this in the PHP itself. There is a function called parse_url() (https://secure.php.net/manual/en/function.parse-url.php) which can give you all the URI params as array. After parsing, you can filter the parameters, remove the unwanted. Finally, use http_build_query() (https://secure.php.net/manual/en/function.http-build-query.php) to build a string URI to return :)

How can I remove invalid querystring using php header location

I have this invalid link hard coded in software which I cannot modify.
http://www.16start.com/results.php?cof=GALT:#FFFFFF;GL:1;DIV:#FFFFFF;FORID:1&q=search
I would like to use php header location to redirect it to a valid URL which does not contain the querystring. I'd like to pass just the parameter q=.
I've tried
$q = $_GET['q'];
header ("Location: http://www.newURL.com/results.php?" . $q . "");
But it's just passing the invalid querystring to the new location in addition to modifying it in a strange way
This is the destination location I get, which is also invalid
http://www.newURL.com/results.php?#FFFFFF;GL:1;DIV:#FFFFFF;FORID:1&q=search
That's because # is seen as the start of a fragment identifier and confuses the parser.
You can take the easy-way as Stretch suggested but you should be aware that q is the last query parameter in your URL. Therefore, it might be better to fix the URL and extract the query parameters in a safer way:
<?php
$url = "http://www.16start.com/results.php?cof=GALT:#FFFFFF;GL:1;DIV:#FFFFFF;FORID:1&q=search";
// Replace # with its HTML entity:
$url = str_replace('#', "%23", $url);
// Extract the query part from the URL
$query = parse_url($url, PHP_URL_QUERY);
// From here on you could prepend the new url
$newUrl = "http://www.newURL.com/results.php?" . $query;
var_dump($newUrl);
// Or you can even go further and convert the query part into an array
parse_str($query, $params);
var_dump($params);
?>
Output
string 'http://www.newURL.com/results.php?cof=GALT:%23FFFFFF;GL:1;DIV:%23FFFFFF;FORID:1&q=search' (length=88)
array
'cof' => string 'GALT:#FFFFFF;GL:1;DIV:#FFFFFF;FORID:1' (length=37)
'q' => string 'search' (length=6)
Update
After your comments, it seems that the URL is not available as a string in your script and you want to get it from the browser.
The bad news is that PHP will not receive the fragment part (everything after the #), because it is not sent to the server. You can verify this if you check the network tab in the Development tools of your browser F12.
In this case, you'll have to host a page at http://www.16start.com/results.php that contains some client-side JavaScript for parsing the fragment and redirecting the user.
one way could be to use strstr() to get everything after (and including q=) in the string.
So:
$q=strstr($_GET['q'],'q=');
Give that a whirl

URL Validation/Sanitization with Regular Expressions

I'm a little out of my depth here but believe I am now on the right track. I want to take user supplied url's and store them in a database so that the links can then be used on a user profile page.
Now the links I'm hoping the users will supply will be for social media site, facebook and the like. Whilst looking for a solution to safely storing user supplied url's I found this page http://electrokami.com/coding/use-php-to-format-and-validate-a-url-with-these-easy-functions/. The code works but seems to remove nearly everything. If I used "www.example.com/user.php?u=borris" it just returns example.com is valid.
Then I found out about regular expressions and found this line of code
/(?:https?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/
from this site https://gist.github.com/marcgg/733592 and another stack overflow post Check if a string contains a url and get contents of url php.
I tried to merge the code together so that I get something that would validate the link for a facebook profile or page. I don't want to get profile info, pics etc but my code's not right either, so rather than getting deeper into stuff I don't fully understand yet I thought asking for help was best.
Below is the code I mashed together which gave me the error "Warning: preg_match_all() [function.preg-match-all]: Compilation failed: unmatched parentheses at offset 29... on line 9"
<?php
// get url to check from the page parameter 'url'
// or use default http://example.com
$text = isset($_GET['url'])
? $_GET['url']
: "http://www.vwrx-project.co.uk/user.php?u=borris";
$reg_exurl = "/(?:http|https|ftp|ftps)?:\/\/)?(?:www\.)?facebook\.com\/(?:(?:\w)*#!\/)?(?:pages\/)?(?:[\w\-]*\/)*([\w\-\.]*)/";
preg_match_all($reg_exurl, $text, $matches);
$usedPatterns = array();
$url = '';
foreach($matches[0] as $pattern){
if(!array_key_exists($pattern, $usedPatterns)){
$usedPatterns[$pattern] = true;
$url = $pattern;
}
}
?>
--------------------------------------------------------- Additional ------------------------------------------------------------
I took a fresh look at the answer Dave provided me with today and felt I could work with it, it makes more sense to me from a code perspective as I can follow the process etc.
I got a system I'm partly happy with. If I supply a link http://www.facebook.com/#!/lilbugga which is a typical link from facebook (when clicking on your username/profile pic from your wall) I can get the result http://www.facebook.com/lilbugga which shows as valid.
What it can't handle is the link from facebook that isn't in a vanity/seo friendly format such as https://www.facebook.com/profile.php?id=4. If I allow my code to accept ? and = then I suspect I'm leaving my website/database open to attack which I don't want.
Whats the best option now? This is the code I have
<?php
$dirty_url = "http://www.facebook.com/profile.php?id=4"; //user supplied link
//clean url leaving alphanumerics : / . only - required to remove facebook link format with /#!/
$clean_url = preg_replace('#[^a-z0-9:/.]#i', '', $dirty_url);
$parsed_url = parse_url($clean_url); //parse url to get brakedown of components
$safe_host = $parsed_url['host']; // safe host direct from parse_url
// str_replace to switch any // to a / inside the returned path - required due to preg_replace process above
echo $safe_path = str_replace("//", "/", ($parsed_url['path']));
if ($parsed_url['host'] == 'www.facebook.com') {
echo "Facebook";
} else {
echo " :( invalid url";
}
?>
Not sure exactly what you are trying to accomplish, but it sounds like you could use parse_url for this:
<?php
$parsed_url = parse_url($_GET['url']);
//assume it's "http://www.vwrx-project.co.uk/user.php?u=borris"
print_r($parsed_url);
/*
Array
(
[scheme] => http
[host] => www.vwrx-project.co.uk
[path] => /user.php
[query] => u=borris
)
*/
if ($parsed_url['host'] == 'www.facebook.com') {
//do stuff
}
?>
I have taken some regex pattern from HERE
Get the matched groups.
(?:http|https|ftp|ftps(?:\/\/)?)?(?:www.|[-;:&=\+\$,\w]+#)([A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??((?:[-\+=&;%#.\w_]*)#?(?:[\w]*)?))
Online demo
Input:
www.example.com/user.php?u=borris
http://www.vwrx-project.co.uk/user.php?u=borris
Output:
MATCH 1
1. [4-15] `example.com`
2. [15-33] `/user.php?u=borris`
3. [25-33] `u=borris`
MATCH 2
1. [45-63] `vwrx-project.co.uk`
2. [63-81] `/user.php?u=borris`
3. [73-81] `u=borris`

Remove certain part of string in PHP [duplicate]

This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.

take facebook page url and store id and slug separately

I'm developing a web app where users enter their facebook page url either in this format:
http://www.facebook.com/pages/Graffiti/119622954518
or
http://www.facebook.com/thefirkinandfox
With php - how do I detect which format automatically, then split (explode?) the parts (the slug and the id or just the slug if the second version).
There is sometimes query data at the end of the url when viewing your own facebook page as an administrator, how do I detect and remove that? I think the answer will be regex of some kind - but I've really only used this to make sure an input is email and still didn't understand it that well... thanks in advance.
Possible entires may or may not include http:// at the beginning... I'd like to account for this...
If you want to use one regexp, try this:
$url = 'www.facebook.com/pages/Graffiti/119622954518';
if(preg_match('#^(https?://)?(www\.)?facebook\.com/((pages/([^/]+)/(\d+))|([^/]+))#', $url, $matches)) {
$slug = isset($matches[5]) ? $matches[5] : (isset($matches[7]) ? $matches[7] : null);
$id = isset($matches[6]) ? $matches[6] : null;
}
Two parts:
^http://www.facebook.com/pages/([^/]+)/([^/]+)(?:\?.*)$
If the first one doesn't match, use this:
^http://www.facebook.com/([^/]+)(?:\?.*)$
The explosion, you mention is the value of the capturing group.
So the code might look something like this:
$subject = "my string";
if (preg_match ('#^http://www.facebook.com/pages/([^/]+)/([^/]+)(?:\?.*)$#', $subject))
print ($groups[1] + ' ' + $groups[1]);
else if (preg_match ('#^http://www.facebook.com/([^/]+)(?:\?.*)$#', $subject))
print ($groups[1]);

Categories