Append a parameter to the end of a URL with PHP - php

I am struggling to do something which appears quite simple...
I use PHP cURL to scrape data and insert it into my website. cURL saves the data as a string in $data before it is output.
What I am trying to do is target all of the URL's contained within $data. The URL's sometimes contain a fixed value parameter that I need move to the end of the URL. The URL's look like this, where category=widgets can appear anywhere in the URL:
http://www.mysite.com/script.php?category=widgets&show=all&size=big
I need to move the parameter category=widgets to the end of all URL's, so they look like this:
http://www.mysite.com/script.php?show=all&size=big&category=widgets
I'm thinking that I can firstly remove all occurences of category=widgets with str_replace, that's the easy bit.
The problem I have is appending category=widgets to the end of the URL. Because the URL is dynamic, perhaps preg_replace is more appropriate. I'm new to regular expressions, and it's giving me a headache.
Would appreciate your help. Thanks.

I'd recommend making use of the parse_url, as this is liable to be considerably more robust in the long term than string manipulation.
As such, you could use parse_url to extract the various chunks and then assemble a new URL based on these as required.

Related

How to assign complicated regex to php variable

first question in a long while! I need to find any and all urls's in a string returned from a facebook page request (I'm requesting the website of a page using the graphi api) and putting the value into an array that I subsequently display in a datatable js table.
Anyhow, I'm having issues as when I build the json data for the datatable, it breaks in some cases:-
http://socialinsightlab.com/datatable_fpages.json
The issue is with the website field having erroneous characters / structure / white space etc in the field.
Anyhow I found the perfect regex to use to find all websites in the field (there can be more than one website listed in the return).
The regex is
(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
When I try and assign it to a php variable as in preg_match_all I can't as it won't accept the regex string into the variable as it has quotes in it I guess.
So my question is how can I extract only the urls found in the website field and then assign them to a variable so i can add them to the datatable.
Here is an example of a call that fails:-
http://socialinsightlab.com/datatable_fpages.json
I need to be able to just return websites and nothing more.
Any ideas?
Thanks
Jonathan
This regex is specifically made as a solution to this problem:
(?:https?:\/\/|www)[^"\s]+
Live demo
If you don't want to deal with all this quotes escaping, you can do the following:
Save regex to a file, say, regex.txt.
Read this file into variable and trim: $regex = trim(file_get_contents("regex.txt"));
Use it with preg_match() etc.

I would like to generate via javascript a url to the current page and to add two parameters

I'm using the sharethis plugin in a smarty template. And in the twitter is shared link which looks so :
<span class='st_twitter_large' st_via="mediajobscom" st_url="MY URL HERE" displayText='Tweet'>
I would like to generate via javascript a url to the current page and add two parameters to the url like ?featuredid=id?featuredname=name:
http://domain.com/currentpageisurl?featuredid=id?featuredname=name
I don't have much knowledge with javascript please try to make things clear to me, especially what i need to insert to st_url="THE CODE HERE".
Any help would be very much appreciated!
Well, you can find the value for the URL of the current page in:
document.URL
You would want to assign this to a variable:
var myURL = document.URL;
and then add the parameters... the comment from Mike Bryant is absolutely correct... a parameter in a url should be like this:
http://myURL.com?parameter1=whatever&parameter2=somethingElse
as long as you are using very basic parameters (no spaces, strange characters [like quotes, amperstands, etc.]) and you know them in advance when you write the page, you can simply tack them onto the string with the + sign:
var myURL = document.URL+"parameter1=whatever&parameter2=somethingElse";
But this greatly depends on what those parameters might be, what values they are going to have, and how you are going to determine that.
It might be that these parameters get defined only when the user interacts with the page, and they might be different every time and contain really strange characters; so this simple solution might not work well for you.
Feel free to update the question if your situation is too complicated for this solution.

How to capture unset REQUEST values in PHP

I'm really unsure if this is even possible but we have an issue where we control an interface that is having XML posted in to it via HTTP post in the form of www.url.com/script.php?xml=<xmlgoeshere>. That is then URL encoded and passed in to us, and we decode and parse it.
Except I have one client who just refuses to url encode their incoming code, which works fine except for when the XML hits an ampersand, at which point everything is being parsed as an end of the xml variable.
www.url.com/script.php?xml=<xmlstart...foo&bar.../>
The end result being that I have XML being POST/GET'd into the xml variable as normal, and then I lose half of the incoming content because of the ampersand.
Now I know that's expected/proper behavior, my question is, is it possible to capture the &bar.../> segment of this code, so that if we hit a known error I can crowbar this into working anyways? I know this is non-ideal but I'm at my wit's end dealing with the outside party.
UPDATE
Ok so I was totally confused. After grabbing the server variables as mentioned below, it looks like I'm not getting the querystring, but that's because on the query they're submitting it has:
[CONTENT_TYPE] => application/x-www-form-urlencoded
[QUERY_STRING] =>
That being the case, is the above behavior still to be expected? Is their a way to get the raw form input in this case? Thanks to the below posters for their help
You'd be hard pressed to do it, if it's even possible, because the fragments of a query string take the format foo=bar with the & character acting as the separator. This means that you'd get an unpredictible $_GET variable created that would take the key name of everything between the & and the next = (assuming there even is one) that would take the value from the = to the next & or the end of the string.
It might be possible to attempt to parse the $_GET array in some way to recover the lost meaning but it would never be all that reliable. You might have more luck trying to parse $_SERVER ['QUERY_STRING'], but that's not guaranteed to succeed either, and would be a hell of a lot of effort for a problem that can be avoided just by the client using the API properly.
And for me, that's the main point. If your client refuses to use your API in the way you tell them to use it, then it's ultimately their problem if it doesn't work, not yours. Of course you should accommodate your clients to a reasonable standard, but that doesn't mean bending over backwards for them just because they refuse to accommodate your needs or technical standards that have been laid down for the good of everyone.
If the only parameter you use is xml=, and it's always at the front, and there are no other parameters, you can do something like this pseudocode:
if (count($_GET)>1 or is_not_well_formed_xml($_GET['xml'])) {
$xml = substr($_SERVER['QUERY_STRING'], 4);
if (is_not_well_formed_xml($xml)) {
really_fail();
}
}
However, you should tell the client to fix their code, especially since it's so easy for them to comply with the standard! You might still get trouble if the xml contains a ? or a #, since php or the web server may get confused about where the query string starts (messing up your $_SERVER['QUERY_STRING'], and either PHP, the client's code or an intermediary proxy or web server may get confused about the #, because that usually is the beginning of a fragment.
E.g., Something like this might be impossible to transmit reliably in a query parameter:
<root><link href="http://example.org/?querystring#fragment"/></root>
So tell them to fix their code. It's almost certainly incredibly easy for them to do so!
UPDATE
There's some confusion about whether this is a GET or POST. If they send a POST with x-www-form-urlencoded body, you can substitute file_get_contents('php://input') for $_SERVER['QUERY_STRING'] in the code above.
YES, Its possible. Using $_SERVER["QUERY_STRING"].
For your url www.url.com/script.php?xml=<xmlstart...foo&bar.../>, $_SERVER["QUERY_STRING"] should contain, xml=<xmlstart...foo&bar.../>;
The following code should extract the xml data.
$pos=strpos($_SERVER["QUERY_STRING"], 'xml');
$xml="";
if($pos!==false){
$xml = substr($_SERVER["QUERY_STRING"], $pos+strlen("xml="));
}
The problem here is that the query string will be parsed for & and = characters. If you know where your = character will be after the "bar" key then you may be able to capture the value of the rest of the string. However if you hit more & you are going to need to know the full content of the incoming message body. If you do then you should be able to get the rest of the content.

Get text from a foreign page in PHP

I need to pull a section of text from an HTML page that is not on my local site, and then have it parsed as a string. Specifically, the last column from this page. I assume I would have to copy the source of the page to a variable and then setup a regex search to navigate to that table row. Is that the most efficient way of doing it? What PHP functions would that entail?
Scrape the page HTML with file_get_contents() (needs ini value allow_url_fopen to be true) or a system function like curl or wget
Run a Regular Expression to match the desired part. You could just match any <td>s in this case, as these values are the first occurrences of table cells, e.g. preg_match("/<td.*?>(.*?)<\/td>/si",$html,$matches); (not tested)
If you can use URL fopen, then a simple file_get_contents('http://somesite.com/somepage') would suffice. There are various libraries out there to do web scraping, which is the name for what you're trying to do. They might be more flexible than a bunch of regular expressions (regexes are known for having a tough time parsing complicated HTML/XML).

How to filter data after using get contents

I want to know how to find a number on a remote website and make it a variable.
For example, if I want to find the stock quote for "AMZN", I would use curl or get contents on the page "http://stock-quotes.com/AMZN" to make it a variable string called $contents
Now that I have $contents, how would I find that AMZN quote? I was thinking of using a regular expression to narrow down the line, like finding "AMZN=35 points", and then perform another function to delete the "AMZN=" and " points" at the start and end of the string so that "35" is all that's left.
Is that how people do it?
1.) DOM Element
2.) Simple XML
3.) preg_match
4.) strpos
What I've always done (say in spidering, etc.) is to use the simple_html_dom library in PHP, then inspect the markup for the site.
The downside, as mentioned before, is that if the markup changes, you'll need to modify your code, but usually it's fairly easy, and if you use a source that has informative markup (consistent class names on the elements you need, etc.), then it's even easier.
Library link: http://simplehtmldom.sourceforge.net/

Categories