first question in a long while! I need to find any and all urls's in a string returned from a facebook page request (I'm requesting the website of a page using the graphi api) and putting the value into an array that I subsequently display in a datatable js table.
Anyhow, I'm having issues as when I build the json data for the datatable, it breaks in some cases:-
http://socialinsightlab.com/datatable_fpages.json
The issue is with the website field having erroneous characters / structure / white space etc in the field.
Anyhow I found the perfect regex to use to find all websites in the field (there can be more than one website listed in the return).
The regex is
(?i)\b((?:[a-z][\w-]+:(?:\/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
When I try and assign it to a php variable as in preg_match_all I can't as it won't accept the regex string into the variable as it has quotes in it I guess.
So my question is how can I extract only the urls found in the website field and then assign them to a variable so i can add them to the datatable.
Here is an example of a call that fails:-
http://socialinsightlab.com/datatable_fpages.json
I need to be able to just return websites and nothing more.
Any ideas?
Thanks
Jonathan
This regex is specifically made as a solution to this problem:
(?:https?:\/\/|www)[^"\s]+
Live demo
If you don't want to deal with all this quotes escaping, you can do the following:
Save regex to a file, say, regex.txt.
Read this file into variable and trim: $regex = trim(file_get_contents("regex.txt"));
Use it with preg_match() etc.
Related
I'm currently developing a table layout.
The tables are using a paginator and a filter function in PHP.
All values are transmitted as GET parameters.
For example, the paginator will use &limit=20&page=5.
The filter is built upon a table row in thead as input fields.
What I mean is that each column has it's own input field.
Once the submit button is clicked, it will pass the data via GET to itself, so the next pageview will query/filter the data correctly.
For example, if I want to filter the postcode the url will be as following:
&limit=20&page=5&postcode=5
Because I'm allowing searches like %5% to show all postcodes where 5 where the result is not limited to 5 only. It will show all data which has a 5 at any spot of the value.
However, if I want to filter the postcodes showing all results with 58, I will type in %58%. As per URL encoding, unfortunately, the URL won't be &postcode=%58% as expected. It will be &postcode=X%.
The question is whether it is somehow possible to get the correct values into the URL?
The problem lays on browser level. If I would change the URL from &postcode=X% to &postcode=%58% directly and hit enter, Chrome would translate it straight away to X%.
Maybe it's possible somehow with meta tags, http headers, or Javascript, etc.
I'm doing it via GET instead of POST because it was - apparently - simpler to integrate with the paginator.
Sorry for my bad English. Any help would be much appreciated.
Thanks a lot.
You should escape the "%" sign itself (that would be "%25"). PHP should be smart enough to decode that automatically.
So &postcode=%58% should become &postcode=%2558%25, which PHP will decode so that $_GET['postcode'] is '%58%'.
You should urlencode your values before inserting them into the params.
Overall though, If you are using mysql I agree with billrichards.
Since you mention %% searches I assume you are using MySQL or another SQL back end to query for the data. In that case I would suggest leaving the querystring always formatted as postcode=58&page=1, and add some other parameter to indicate if it should be a %wildcard% search or exact match, and if the wildcard parameter is there, add the %% on the back end when performing the query.
Apologies if this has already been asked.
I am working on a project where I am looking to display locations of a business. This can be either by state, or by city (in a state).
I am trying to work with these two routes:
GET /#state
GET /#city-#state
#state works well, but when I try to navigate to a #city-#state page, I get errors because it is trying to load the #state page, and cannot find the required data.
Looking at base.php, I found that the preg_match_all is matching with \w, so it should be ignoring the hyphen(-), but for some reason isn't.
I need the URLs to be in this structure.
Can someone help me notice what I am missing?
Thanks!
I don't think F3 allows you to use a dash to separate tokens in a url; hence why it's always matching the first token (#state).
The regex used to grab tokens is '/#(\w+)/', it wants a slash character to separate tokens.
I would suggest using /#state and /#city/#state.
I need to pull a section of text from an HTML page that is not on my local site, and then have it parsed as a string. Specifically, the last column from this page. I assume I would have to copy the source of the page to a variable and then setup a regex search to navigate to that table row. Is that the most efficient way of doing it? What PHP functions would that entail?
Scrape the page HTML with file_get_contents() (needs ini value allow_url_fopen to be true) or a system function like curl or wget
Run a Regular Expression to match the desired part. You could just match any <td>s in this case, as these values are the first occurrences of table cells, e.g. preg_match("/<td.*?>(.*?)<\/td>/si",$html,$matches); (not tested)
If you can use URL fopen, then a simple file_get_contents('http://somesite.com/somepage') would suffice. There are various libraries out there to do web scraping, which is the name for what you're trying to do. They might be more flexible than a bunch of regular expressions (regexes are known for having a tough time parsing complicated HTML/XML).
I am struggling to do something which appears quite simple...
I use PHP cURL to scrape data and insert it into my website. cURL saves the data as a string in $data before it is output.
What I am trying to do is target all of the URL's contained within $data. The URL's sometimes contain a fixed value parameter that I need move to the end of the URL. The URL's look like this, where category=widgets can appear anywhere in the URL:
http://www.mysite.com/script.php?category=widgets&show=all&size=big
I need to move the parameter category=widgets to the end of all URL's, so they look like this:
http://www.mysite.com/script.php?show=all&size=big&category=widgets
I'm thinking that I can firstly remove all occurences of category=widgets with str_replace, that's the easy bit.
The problem I have is appending category=widgets to the end of the URL. Because the URL is dynamic, perhaps preg_replace is more appropriate. I'm new to regular expressions, and it's giving me a headache.
Would appreciate your help. Thanks.
I'd recommend making use of the parse_url, as this is liable to be considerably more robust in the long term than string manipulation.
As such, you could use parse_url to extract the various chunks and then assemble a new URL based on these as required.
I m building a small search script for my website. I need to send data by get method because by POST it will get real messy as I have to show many pages of search results.
So, My question is Can I use get method directly? means do i need to encode url or any other thing ??
I have checked it in modern browsers. It works just fine..
Thanks
Edit:
Urlencode is used when puting variables in url.
I am submitting my search form with method='get' Then I get variable and perform search query and make new page links with variable data.
- Length,Size is not a prob.
U people suggesting I should use urlencode func. while making new links only ???
You can and should use urlencode() on data that possibly contains spaces and other URL-unfriendly characters.
http://php.net/manual/en/function.urlencode.php
You need to URL Encode the parameters on the URL eg http://www.example.com/MyScript.php?MyVariable=%3FSome%20thing%3F.
Be aware that there's a limit to how much data can be sent via GET - more restrictive on older browsers. If I remember correctly, IE6 has a limit of 1024 characters in the URL so if you think you're going to go over that, consider using POST or you may exclude some users.
You should use urlencode($variable) (Link) before sending the variable (even though the browser usually takes care of this) and urldecode ($variable) (Link) after receiving it, this way you can be sure special chars will be treated correctly.