Extracting URLs from a JSON-like string - php

I need to extract the first URL from some content. The content may be like this:
({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});
or may contain only a link
({items:[{url:"http://portlandor.ebayclassifieds.com/",name:"Portland (OR)"}],error:null});
currently I have :
$pattern = "/\:\[\{url\:\"(.*)\"\,name/";
preg_match_all($pattern, $htmlContent, $matches);
$URL = $matches[1][0];
however it works only if there is a single link so I need a regex which should work for the both cases.

You can use this REGEX:
$pattern = "/url\:\"([^\"]+)\"/";
Worked for me :)

Hopefully this should work for you
<?php
$str = '({items:[{url:"http://cincinnati.ebayclassifieds.com/",name:"Cincinnati"},{url:"http://dayton.ebayclassifieds.com/",name:"Dayton"}],error:null});'; //The string you want to extract the 1st URL from
$match = ""; //Define the match variable
preg_match("%(((ht|f)tp(s?))\://)?(www.|[a-zA-Z].)[a-zA-Z0-9\-\.]+\.(com|edu|gov|mil|net|org|biz|info|name|museum|us|ca|uk)(\:[0-9]+)*(/($|[a-zA-Z0-9\.\,\;\?\'\\\+&\%\$#\=~_\-]+))*%",$str,$match); //I Googled for the best Regular expression for URLs and found the one included in the preg_match
echo $match[0]; //Return the first item in the array (the first URL returned)
?>
This is the website that I found the regular expression on: http://regexlib.com/Search.aspx?k=URL
like the others have said, json_decode should work for you aswell

That smells like JSON to me. Try using http://php.net/json_decode

Looks like JSON to me, visit http://php.net/manual/en/book.json.php and use json_decode().

Related

get last part of url dynamic

I found a way to get the last part of the url, I just don't know if there's an even better way since I want it to be dynamic.
This is the way I did it:
$url = $_SERVER['REQUEST_URI'];
$categoryName = basename($url);
The last part of the url in this case is always a category(horror for e.g) that's in my database, so the url will always looks like this:
http://localhost:8888/blog/public/index.php/categories/Horror
or
http://localhost:8888/blog/public/index.php/categories/Fantasy
I think you got my point.
Well, the question is, is there a better way or is mine okay? Especially when looking at the
$_SERVER['REQUEST_URI']
Use explode() to split URL by / delimiter and use end() to get last item of array.
$url = "http://localhost:8888/blog/public/index.php/categories/Horror";
$categoryName = #end(explode("/", $url));
// Horror
You can always use a simple regex to get it.
$re = '#.*/(.*)#m';
$str = 'http://localhost:8888/blog/public/index.php/categories/Horror';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo $matches[0][1];
//outputs `Horror`
if you are using laravel or symphony use end(Request::segments())

Regex for pulling video ID from Rumble URL

Argh-- regular expressions make me crazy, I've just spent 20 minutes trying to get this to fly and I'm having no luck. And I know someone here will be able to pop this out in like 2 seconds! :-)
Here's a sample source URL: https://rumble.com/v30sqt-oreo-ice-cream-cake.html
I want to extract the "v30sqt" characters. Actually, I want to extract any characters after "rumble.com/" and before the first dash. It might be alphanumeric, it might be all letters, it might be longer than 6 characters, etc. That's the video ID.
This is for php preg_match.
You can simply use parse_url instead of using regex along with explode and current function like as
$url = "https://rumble.com/v30sqt-oreo-ice-cream-cake.html";
$parsed_arr = explode("-",ltrim(parse_url($url, PHP_URL_PATH),"/"));
echo current($parsed_arr);
or
echo $parsed_arr[0];
Demo
Try this one should work for you :
/(?<=rumble.com\/).*?\b/g
Demo and Explaination
Go for:
<?php
$url = "https://rumble.com/v30sqt-oreo-ice-cream-cake.html";
$regex = '~rumble\.com/(?P<video>[^-]+)~';
if (preg_match($regex, $url, $match)) {
echo $match['video'];
# v30sqt
}
?>
With a demo on ideone.com.

In php substr(INFORMATICS&SYSTEMS-58600,-5) is returning 'ATICS'

I know this type of questions have been asked before but none solves my problem.
I want to capture numeric parts of this string INFORMATICS&SYSTEMS-58600 i.e. 58600.
I am trying to do substr(INFORMATICS&SYSTEMS-58600,-5) which returns ATICS which is substr of first part of string INFORMATICS but I want the last part.
Wherever & is appearing this is behaving same.
I know its a very basic mistake but what ??? I cant figure out.Please help me out.
$str = 'INFORMATICS&SYSTEMS-58600';
preg_match_all('!\d+!', $str, $matches);
print_r($matches);
Can refer Extract numbers from a string
Actully PHP substr is working fine.
1. I was passing this text as url query in ajax i.e. get_data.php?dept ='informatics& system' so anything after & was treated as second parameter.
I found this nice answer on link to pass ajax parameters in url as encoded.
The regex in this code matches a number at the end of a string.
<?php
$str = "INFORMATICS&SYSTEMS-58600";
$matches = array();
preg_match("/\d+$/", $str, $matches);
foreach($matches as $match) {
echo $match;
}
?>
Output:
58600

PHP- Parsing words from a string without spaces?

My webpage has a variable, $currentPage. This is a string of the php token name of the page I'm currently on.
Example: All categories under the user section have names such as:
uAdminNew, uAdminEdit, ect..
I would like for a way to parse out the uAdmin and just determine what is the last word (New and Edit) and call upon functions from there.
I have my navigation system working through these names, therefore I can't change the names or I would to make it easier to parse. Such as adding delimiters.
Is this something only Regex can solve or is there a simpler solution I'm missing? If this is Regex could you explain or provide a link as to how I would go about using it to test against a specific list of strings? I'm very new to it.
For example, so:
$str = 'uAdminEdit';
$ar = preg_match('/([A-Z][^A-Z]+$)/', $str, $m);
echo $m[1]; // Edit
Does the pagename always start with uAdmin? If so, you could split the string by "uAdmin" with explode():
$page = 'uAdminEdit';
echo explode('uAdmin', $page)[1]; //Output: Edit
Or simply remove "uAdmin" with str_replace():
$page = 'uAdminEdit';
echo str_replace('uAdmin', '', $page); //Output: Edit
If you just want the section after uAdmin, use the regex capture groups
preg_match('/uAdmin(.*)/', $sub, $matches);
echo $matches[1]

preg_replace how change part of uri

I am trying to change all the links of a html with php preg_replace.
All the uris have the following form
http://example.com/page/58977?forum=60534#comment-60534
I want to change it to:
http://example.com/60534
which means removing everything after "page" and before "comment-", including these two strings.
I tried the following, but it returns no changes:
$result = preg_replace("/^.page.*.comment-.$/", "", $html);
but it seems that my regex syntax is not correct, as it returns the html unchanged.
Could you please help me with this?
The ^ is an anchor that only matches the start of the string, and $ only matches at the end. In order to match you should not anchor the regular expression:
$result = preg_replace("/page.*?comment-/", "", $html);
Note that this could match things that are not URLs. You may want to be more specific as to what will be replaced, for example you might want to only replace links starting with either http: or https: and that don't contain whitespace.
You probably simply need this: http://php.net/manual/en/function.parse-url.php
This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
Alternate way without using regular expression.
Uses parse_url()
<?php
$url = 'http://example.com/page/58977?forum=60534#comment-60534';
$array = parse_url($url);
parse_str($array['query'], $query);
$http = ($array['scheme']) ? $array['scheme'].'://' : NULL;
echo $http.$array['host'].'/'.$query['forum'];
?>
Demo: http://codepad.org/xB3kO588

Categories