I'm doing some work for a client that involves parsing the referrer information from Google et al to target various parts of a page to the user's search keywords.
I noticed that Perl's CPAN has a module called URI::ParseSearchString which seems to do exactly what I need. The problem is, I need to do it in PHP.
So, to avoid reinventing the wheel, does anyone know if there is a library out there for PHP that does the same / similar thing?
parse_str() is what you are looking for.
You may additionally want to use parse_url() to get the search string.
I'm the author of the module. As far as I know, I've never seen something similar for PHP. If you do come across anything, please do let me know.
That being said, I cannot image this being very hard to port to PHP and I can have an attempt at it if you dont find anything similar out there.
Spiros
Maybe this is too inefficient or the http_referer isn't showing the full uri ...
function parse_uri($uri) {
if (substr_count('?', $uri) > 0) {
$queryString = explode('?', $uri);
return parse_str($queryString[1]);
} else {
return parse_str($uri);
}
}
if (isset($_SERVER['HTTP_REFERER'])) {
print_r(parse_uri($_SERVER['HTTP_REFERER']));
}
Related
I've recently run into a bug in PHP 7.1 which seems to have come back after being fixed in PHP 5.4.7
The problem is simply that if you pass a url to parse_url() and the url doesn't have a scheme it will return the whole url as if it's just a path. For example:
var_dump(parse_url('google.co.uk/test'))
Result:
array(1) { ["path"]=> string(12) "google.co.uk/test" }
While in reality here it should split into its domain and path.
I run parse_url a few ten million times a day as part of url decryption / encryption functionality. I'm looking for a fast way to fix this edgecase bug or have a reliable alternative to parse_url.
Edit:
Thanks for the helpful responses, here's the solution I used in the end, I hope it helps someone. I won't submit it as an answer because I already marked someone else as correct (which they are) which allowed me to write this.
$parsedUrl = parse_url($uri);
// if the uri has no scheme, it won't think there's a host and will give bad results
if ($parsedUrl !== false && !isset($parsedUrl['host'])) {
// double slash prepended will parse $uri as if it has a schema and no schema will be in the result
$parsedUrl = parse_url('//' . $uri);
}
if ($parsedUrl === false) {
throw new MalformedUrlException('Malformed URL: ' . $uri);
}
// use parsed url as needed
parse_url needs to have information if the given string is the beginning of a url.
this is why parse_url('//domain/path') works -> it will just not output any schema.
now to describe the problem you want to be solved: php would need to know every domain there is and to then be able to decide if this is what the user wanted (basically impossible)
Take for example the following url: 'http://whois.domaintools.com/test.at' -> if I only pass the path it will write 'test.at' -> is this now a path or domain?
hello im a newbie in php i am trying make a search function using php but only inside the website without any database
basically if i want to search a string namely "Health" it would display the lines
The Joys of Health
Healthy Diets
This snippet is the only thing i could find if properly coded would output the "lines" i want
$myPage = array("directory.php","pages.php");
$lines = file($myPage[n]);
echo $lines[n];
i havent tried it yet if it would work but before i do i want to ask if there is any better way to do this?
if my files have too many lines wont it stress out the server?
The file() function will return an array. You should use file_get_contents() instead, as it returns a string.
Then, use regular expressions to find specific text within a link.
Your goal is fine but the method you're thinking about is not. the file() function read a file, line by line, and inserts it into an array. This assumes the HTML is well-structured in a human-readable fashion, which is not always the case. However, if you're the one providing the HTML and you make sure the structure is perfectly defined, ok... here you have the example you provided us with but complete (take into account it's the 'wrong' way of solving your problem, but if you want to follow that pattern, it's ok):
function pagesearch($pages, $string) {
if (!empty($pages) && !empty($string)) {
$tags = [];
foreach ($pages as $page) {
if ($lines = file($page)) {
foreach ($lines as $line) {
if (!empty($line)) {
if (mb_strpos($line, $string)) {
$tags[$page][] = $line;
}
}
}
}
}
return $tags;
}
}
This will return you an array with all the pages you referenced with all occurrences of the word you look for, separated by page. As I said, it's not the way you want to solve this, but it's a way.
Hope that helps
Because you do not want to use any database and because the term database is very broad and includes the file-system you want to do a search in some database without having a database.
That makes no sense. In your case one database at least is the file-system. If you can accept the fact that you want to search a database (here your html files) but you do not want to use a database to store anything related to the search (e.g. some index or cached results), then what you suggest is basically how it is working: A real-time, text-based, line-by-line file-search.
Sure it is very rudimentary but as your constraint is "no database", you have already found the only possible way. And yes it will stress your server when used because real-time search is expensive.
Otherwise normally Lucene/Solr is used for the job but that is a database and a server even.
I'm working on a website that uses a lot of XML-files as data (150 in total and probably growing). Each page is an XML-file.
What I'm looking for is a way to look for a string through the XML-files. I'm not sure what programming language to use for this XML search engine.
I'm familiar with PHP, JavaScript, JQuery. So I'd prefer using those languages.
Thanks a bunch!
UPDATE: I'm looking for a solution that works quickly.
Ideally, the function returns the tagname that contains the searchstring.
If, for instance, the XML is as follows:
<article-1>This is a great story.</article-1>
If one would search for 'story', it would return 'article-1'.
I'm not quite sure on how to do this with a regular expression.
PHP can do this. Here's an example:
foreach(glob("{foldera/*.xml,folderb/*.xml}",GLOB_BRACE) as $filename) {
$xml = simplexml_load_file($filename);
//use regular expressions to find your string
}
You simply iterate through each file on your server using glob() with a foreach loop.
Sounds like a problem that could be solved with grep and regular expressions. Without knowing what string you're looking for it's not possible to say exactly what you should do, but reading some documentation on grep should get you started down the right path.
I'd like to make a script where the user can enter a sum e.g. 4^5+(56+2)/3 or any other basic maths sum (no functions etc.) how would I go about doing this? Presumably regex. Could somebody point me in the right direction - I'm guessing this isn't going to be too easy so I'd just like some advice on where to start and I'll take it from there.
Have a look at this: http://www.webcheatsheet.com/php/regular_expressions.php
It's a good intro to Regular Expressions and how to use them in PHP.
Yes, someone can (and probably will) just give you the regex you need to work this out but it helps a lot if you understand HOW your regex works. They look scary but aren't that bad really...
this is not my code, but there is a great PHP snippet that lets you use Google Calculator to do the calculations. So you can just enter your query (ie, "7+3") using regular/FOIL notation or whatever, and it will return the result.
http://www.hawkee.com/snippet/5812/
<?php
// Google calculator
function do_calculator($query){
if (!empty($query)){
$url = "http://www.google.co.uk/search?q=".urlencode($query);
$f = array("Â", "<font size=-2> </font>", " × 10", "<sup>", "</sup>");$t = array("", "", "e", "^", "");
preg_match('/<h2 class=r style="font-size:138%"><b>(.*?)<\/b><\/h2>/', file_get_contents($url), $matches);
if (!$matches['1']){
return 'Your input could not be processed..';
} else {
return str_replace($f, $t, $matches['1']);
}
} else {
return 'You must supply a query.';
}
}
?>
The easy way to do it is with eval(). If you're accepting arbitrary input from a web form and executing in on a server, though, you MUST be careful to only accept valid expressions. Use a regex for that.
I'm in the process of rewriting a Perl-based web crawler I wrote nearly 8 years ago in PHP. I used the quite handy URI::URL module in perl to do things like:
$sourceUrl = '/blah.html';
$baseHost = 'http://www.example.com';
my $url = URI::URL->new($sourceUrl, $baseHost);
return $url->abs;
returns: 'http://www.example.com/blah.html'
the parse_url function in PHP is quite handy, but is there something more robust? Specifically something that will give the above functionality?
Maybe Zend_Uri is what you are looking for?
print $baseHost . $sourceURL;
Am I missing something? Your way seems needlessly overcomplicated.
I did a bit of searching on the PEAR archive, and my first-guess approximation of URI::URL is Net_URL2. Maybe you want to give that a shot?