I am trying to return html from a url that has two sections which need to have wildcards. I am pretty sure regex is the way to go (unless there are alternatives I am not aware of?)... but I can't seem to get it right. I don't have any knowledge of regex and was hoping someone could please help me with this.
The wildcards need to support any number of alphanumeric characters, including underscores and as many special characters as possible.
All of these are possible scenarios:
https://www.example.com/type/1062483_name
https://www.example.com/type/name_ii
https://www.example.com/type/name
I've tried using all the regex tools online but I can't seem to be able to match it. Here is what my code looks like at the moment:
$url = $baseurl . $type . $wildcard1 . $name . $wildcard2
function get_http_response_code($url) {
$headers = get_headers($url);
return substr($headers[0], 9, 3);
}
if(get_http_response_code($url) == "200"){
$html = file_get_contents($url);
}
I need (get_http_response_code($url) == "200") to return true.
Any advice on how to make this work would be amazing. Thanks!
Related
As a novice and beginner php learner, I'm using the Code-Canyon Premium URL Shortner script and done 2 days of research. Unfortunately I am unable to resolve my issue.
The url shorten script is urlencoding the API url that it sends to the script, In doing this it is replacing the & symbols with & causing the url to not work correctly on the final destination page.
I have tried to use preg_replace, str_replace and also tried to use urldecode on the destination page but none of these seem to work. Here is my current script:
$makeshort = "http://mywebsite.com/email/quote.php?quoteid=$visitor&customertype=fhbs";
$mkshrt = str_replace("/&/","%26",$makeshort);
$short = "http://shorturl.com/api?&api=REMOVED&format=text&url=".urlencode($mkshrt);
// Using Plain Text Response
$api_url = $short;
$res= #file_get_contents($api_url);
if($res)
$shorturl = $res;
$shorty = json_decode($shorturl);
$shorturl = $shorty->{'short'};
echo $shorturl;
Note: Where you see &format=text in the api url, I have tried to use it with and without the &format=text however this makes no difference what so ever.
I am hoping that there could be a simple and quick way to resolve this issue as I am only passing over 2 variables and its the second variable that is being displayed like this:
mywebsite.com/email/quote.php?quoteid=01234567890&customertype=fhbs
So the customertype variable is the one being messed up due to the amp; symbol.
I sincerely hope someone with the expertise could advise me on the best approach or even a simple way to resolve this issues as I really am at my whits end! MY knowledge is not good enough to research the exact key phrases in order to point myself in the right direction.
Thanks for your time in reading this and I hope someone would be kind enough to help me out here.
I know the feeling as i myself am just becoming to terms with coding and developing.
I personally would solve this by one of two ways, If you have tried to already use htmlspecialchars or htmlentities along with urldecode then the most simple and quickest way to achieve this would be to read the URL string then replace the &symbol with the & using str_replace and do either a meta refresh of the page or `header location redirect
Here is what i mean with a breif example however one must stress that some extra security maybe needed and this is ONLY a quick fix not a secure stable and permanent fix, Though one could play with this and maybe work something out for your own circumstances.
$url = "http://". $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if(strstr($url, "&")){
$url = "http://". $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
$url = str_replace('&', '&', $url);
echo "<meta http-equiv='refresh' content='0;URL=$url'>";
exit;
}
Alternative way with header location:
$url = "http://". $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if(strstr($url, "&")){
$url = "http://". $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
$url = str_replace('&', '&', $url);
header("Location: $url");
exit();
}
This will totally remove any & symbols from the url and replace them with &.
You can also play around with this to remove even more from the url string and replace things like / or forbidden words.
An example of the output will look like this:
Original url causing the problems:
http://mywebsite.com/email/quote.php?quoteid=1234567890&customertype=fhbs
New url after the script has executed and refreshed the page:
http://mywebsite.com/email/quote.php?quoteid=1234567890&customertype=fhbs
As you can see from the hyperlinked text above, The ampersand breaks the string and everything after that is not read correctly but when this script executes and refreshes the page the url will be just like the second hyperlink thus making the url work for what you require.
NOTE: THIS IS NOT A SECURE WAY OF DOING THINGS AND MAY NO BE IDEAL FOR YOUR CIRCUMSTANCES, THIS IS JUST AN IDEA AND HOPE THIS HELPS!
Thanks.
I'm reading source code of an online shop website, and on each product page I need to find a JSON string which shows product SKUs and their quantity.
Here are 2 samples:
'{"sku-SV023435_B_M":7,"sku-SV023435_BL_M":10,"sku-SV023435_PU_M":11}'
The sample above shows 3 SKUs.
'{"sku-11430_B_S":"20","sku-11430_B_M":"17","sku-11430_B_L":"30","sku-11430_B_XS":"13","sku-11430_BL_S":"7","sku-11430_BL_M":"17","sku-11430_BL_L":"4","sku-11430_BL_XS":"16","sku-11430_O_S":"8","sku-11430_O_M":"6","sku-11430_O_L":"22","sku-11430_O_XS":"20","sku-11430_LBL_S":"27","sku-11430_LBL_M":"25","sku-11430_LBL_L":"22","sku-11430_LBL_XS":"10","sku-11430_Y_S":"24","sku-11430_Y_M":36,"sku-11430_Y_L":"20","sku-11430_Y_XS":"6","sku-11430_RR_S":"4","sku-11430_RR_M":"35","sku-11430_RR_L":"47","sku-11430_RR_XS":"6"}',
The sample above shows many more SKUs.
The number of SKUs in the JSON string can range from one to infinity.
Now, I need a regex pattern to extract this JSON string from each page. At that point, I can easily use json_encode().
Update:
Here I found another problem, sorry that my question was not complete, there is another similar json string which is starting with sku- , Please have a look at source code of below link you will understand, the only difference is the value for that one is alphanumeric and for our required one is numeric. Also please note our final goal is to extract SKUs with their quantity, maybe you have a most straightforward solution.
Source
#chris85
Second update:
Here is another strange issue which is a bit off topic.
while I'm opening the URL content using below code there is no json string in the source!
$html = file_get_contents("http://www.dresslink.com/womens-candy-color-basic-coat-slim-suit-jacket-blazer-p-8131.html");
But when I'm opening the url with my browser the json is there! really confused about this :(
Trying to extract specific data from json directly with regexp is normally always a bad idea due to the way json is encoded. The best way is to regexp the whole json data, then decode using the php function json_decode.
The issue with the missing data is due to a missing required cookie. See my comments in the code below.
<?php
function getHtmlFromDresslinkUrl($url)
{
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
//You must send the currency cookie to the website for it to return the json you want to scrape
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Cookie: currencies_code=USD;',
));
$output=curl_exec($ch);
curl_close($ch);
return $output;
}
$html = getHtmlFromDresslinkUrl("http://www.dresslink.com/womens-candy-color-basic-coat-slim-suit-jacket-blazer-p-8131.html");
//Get the specific arguments for this js function call only
$items = preg_match("/DL\.items\_list\.initItemAttr\((.+)\)\;/", $html, $matches);
if (count($matches) > 0) {
$arguments = $matches[1];
//Split by argument seperator.
//I know, this isn't great but it seems to work.
$args_array = explode(", ", $arguments);
//You need the 5th argument
$fourth_arg = $args_array[4];
//Strip quotes
$fourth_arg = trim($fourth_arg, "'");
//json_decode
$qty_data = json_decode($fourth_arg, true);
//Then you can work with the php array
foreach ($qty_data as $name => $qtty) {
echo "Found " . $qtty . " of " . $name . "<br />";
}
}
?>
Special thanks to #chris85 for making me read the question again. Sorry but I couldn't undo my downvote.
You will want to use preg_match_all() to perform the regex matching operation (documentation here).
The following should do it for you. It will match each substring beginning with "sku" and ending with ",".
preg_match_all("/sku\-.+?:[0-9]*/", $input)
Working example here.
Alternatively, if you want to extract the entire string, you can use:
preg_match_all("/{.sku\-.*}/, $input")
This will grab everything between the opening and closing brackets.
Working example here.
Please note that $input denotes the input string.
A simple /'(\{"[^\}]+\})'/ will match all these JSON strings. Demo: https://regex101.com/r/wD5bO4/2
The first element of the returned array will contain the JSON string for json_decode:
preg_match_all ("/'(\{\"[^\}]+\})'/", $html, $matches);
$html is the HTML to be parsed, the JSON will be in $matches[0][1], $matches[1][1], $matches[2][1] etc.
I am trying to append a URL if necessary, and skip over it when not necessary. The think is, I'm learning php right now and I would like to use regular expressions as much as possible. would it be possible to make this code more concise using preg_match? Example:
<?php
$facebook_url = str_replace("facebook.org","facebook.com", trim($_REQUEST['facebook_url']));
$position = strpos($facebook_url, "facebook.com");
if ($position === false) {
$facebook_url = "http://www.facebook.com/" . $facebook_url;
}
?>
But using:
if (!preg_match("/^(http:///www.facebook.com | facebook.com)/i"), $facebook_url)) {
$facebook_url = "http://www.facebook.com/" . $facebook_url;
}
I feel like that should work the way I understand php syntax, but something isn't working right. Thank you in advance.
I don't know why you would want to use regex "as much as possible" as opposed to as much as is needed, which should be very little. In your case the original code is much faster, and you can still do it with less code:
if (stripos($facebook_url, "facebook.com") === false) {
Your regex would require a space after the .com or before the facebook in the alternation. Space matters in regex.
This question already has answers here:
Get domain name (not subdomain) in php
(18 answers)
Closed 10 years ago.
I've already seen a bunch of questions on this exact subject, but none seem to solve my problem. I want to create a function that will remove everything from a website address, except for the domain name.
For example if the user inputs: http://www.stackoverflow.com/blahblahblah I want to get stackoverflow, and the same way if the user inputs facebook.com/user/bacon I want to get facebook.
Do anyone know of a function or a way where I can remove certain parts of strings? Maybe it'll search for http, and when found it'll remove everything until after the // Then it'll search for www, if found it'll remove everything until the . Then it keeps everything until the next dot, where it removes everything behind it? Looking at it now, this might cause problems with sites as http://www.en.wikipedia.org because I'll be left with only en.
Any ideas (preferably in PHP, but JavaScript is also welcome)?
EDIT 1:
Thanks to great feedback I think I've been able to work out a function that does what I want:
function getdomain($url) {
$parts = parse_url($url);
if($parts['scheme'] != 'http') {
$url = 'http://'.$url;
}
$parts2 = parse_url($url);
$host = $parts2['host'];
$remove = explode('.', $host);
$result = $remove[0];
if($result == 'www') {
$result = $remove[1];
}
return $result;
}
It's not perfect, at least considering subdomains, but I think it's possible to do something about it. Maybe add a second if statement at the end to check the length of the array. If it's bigger than two, then choose item nr1 instead of item nr0. This obviously gives me trouble related to any domain using .co.uk (because that'll be tree items long, but I don't want to return co). I'll try to work around on it a little bit, and see what I come up with. I'd be glad if some of you PHP gurus out there could take a look as well. I'm not as skilled or as experienced as any of you... :P
Use parse_url to split the URL into the different parts. What you need is the hostname. Then you will want to split it by the dot and get the first part:
$url = 'http://facebook.com/blahblah';
$parts = parse_url($url);
$host = $parts['host']; // facebook.com
$foo = explode('.', $host);
$result = $foo[0]; // facebook
You can use the parse_url function from PHP which returns exactly what you want - see
Use the parse_url method in php to get domain.com and then use replace .com with empty string.
I am a little rusty on my regular expressions but this should work.
$url='http://www.en.wikipedia.org';
$domain = parse_url($url, PHP_URL_HOST); //Will return en.wikipedia.org
$domain = preg_replace('\.com|\.org', '', $domain);
http://php.net/manual/en/function.parse-url.php
PHP REGEX: Get domain from URL
http://rubular.com/r/MvyPO9ijnQ //Check regular expressions
You're looking for info on Regular Expression. It's a bit complicated, so be prepared to read up. In your case, you'll best utilize preg_match and preg_replace. It searches for a match based on your pattern and replaces the matches with your replacement.
preg_match
preg_replace
I'd start with a pattern like this: find .com, .net or .org and delete it and everything after it. Then find the last . and delete it and everything in front of it. Finally, if // exists, delete it and everything in front of it.
if (preg_match("/^http:\/\//i",$url))
preg_replace("/^http:\/\//i","",$url);
if (preg_match("/www./i",$url))
preg_replace("/www./i","",$url);
if (preg_match("/.com/i",$url))
preg_replace("/.com/i","",$url);
if (preg_match("/\/*$/",$url))
preg_replace("/\/*$/","",$url);
^ = at the start of the string
i = case insensitive
\ = escape char
$ = the end of the string
This will have to be played around with and tweaked, but it should get your pointed in the right direction.
Javascript:
document.domain.replace(".com","")
PHP:
$url = 'http://google.com/something/something';
$parse = parse_url($url);
echo str_replace(".com","", $parse['host']); //returns google
This is quite a quick method but should do what you want in PHP:
function getDomain( $URL ) {
return explode('.',$URL)[1];
}
I will update it when I get chance but basically it splits the URL into pieces by the full stop and then returns the second item which should be the domain. A bit more logic would be required for longer domains such as www.abc.xyz.com but for normal urls it would suffice.
I'd like to make a script where the user can enter a sum e.g. 4^5+(56+2)/3 or any other basic maths sum (no functions etc.) how would I go about doing this? Presumably regex. Could somebody point me in the right direction - I'm guessing this isn't going to be too easy so I'd just like some advice on where to start and I'll take it from there.
Have a look at this: http://www.webcheatsheet.com/php/regular_expressions.php
It's a good intro to Regular Expressions and how to use them in PHP.
Yes, someone can (and probably will) just give you the regex you need to work this out but it helps a lot if you understand HOW your regex works. They look scary but aren't that bad really...
this is not my code, but there is a great PHP snippet that lets you use Google Calculator to do the calculations. So you can just enter your query (ie, "7+3") using regular/FOIL notation or whatever, and it will return the result.
http://www.hawkee.com/snippet/5812/
<?php
// Google calculator
function do_calculator($query){
if (!empty($query)){
$url = "http://www.google.co.uk/search?q=".urlencode($query);
$f = array("Â", "<font size=-2> </font>", " × 10", "<sup>", "</sup>");$t = array("", "", "e", "^", "");
preg_match('/<h2 class=r style="font-size:138%"><b>(.*?)<\/b><\/h2>/', file_get_contents($url), $matches);
if (!$matches['1']){
return 'Your input could not be processed..';
} else {
return str_replace($f, $t, $matches['1']);
}
} else {
return 'You must supply a query.';
}
}
?>
The easy way to do it is with eval(). If you're accepting arbitrary input from a web form and executing in on a server, though, you MUST be careful to only accept valid expressions. Use a regex for that.