I have a file that contains a bunch of links:
site 1
site 2
site 3
I want to get the URL to a link with specific text. For example, search for "site 2" and get back "http://site2.com"
I tried this:
preg_match("/.*?[Hh][Rr][Ee][Ff]=\"(.*?)\">site 2<\/[Aa]>.*/", $contents, $match)
(I know the HREF= will be the last part of the anchor)
But it returns
http://site1.com">site 1</a><a href="http://site2.com
Is there a way to do a search backwards, or something? I know I can do preg_match_all and loop over everything, but I'm trying to avoid that.
Try this:
preg_match("(<a.*?href=[\"']([^\"']+)[\"'][^>]?>site 2</a>)i",$contents,$match);
$result = $match[1];
Hope this helps!
Or you can try using phpQuery.
Related
I have seen on most online newspaper websites that when i click on a headline link, e.g. two thieves caught red handed, it normally opens a url like this: www.example.co.uk/news/two-thieves-caught-red-handed.
How do I deal with this url in php code, so that I can only pick the last part in the url. e.g. two-thieves-caught-red-handed. After that I want to work with this string.
I know how to deal with GET parameters like "www.example.co.uk/news/headline=two thieves caught red handed".
But I do not want to do it that way. Could you show me another way.
You can use the combination of explode and end functions for that
for example:
<?php
$url = "www.example.co.uk/news/two-thieves-caught-red-handed";
$url = explode('/', $url);
$end = end($url);
echo "$end";
?>
The code will result
two-thieves-caught-red-handed
You have several options in php to get the current url. For a detailed overview look here.
One would be to use $_SERVER[REQUEST_URI] and the use a string manipulation function for extraction of the parts you need.
Maybe this thread will help you too.
I am trying to grab content from another one of my site which is working fine, apart from all the links are incorrect.
include_once('../simple_html_dom.php');
$page = file_get_html('http://www.website.com');
$ret = $page->find('div[id=header]');
echo $ret[0];
Is there anyway instead of all links showing link to have the full link? using preg replace.
$ret[0] = preg_replace('#(http://([\w-.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)#',
'http://fullwebsitellink.com$1', $ret[0]);
I guess it would be something like above but I dont understand?
Thanks
Your question doesn't really explain what is "incorrect" about the links, but I'm guessing you have something like this:
<div id="header">Home | Sitemap</div>
and you want to embed it in another site, where those links need to be fully-qualified with a domain name, like this:
<div id="header">Home | Sitemap</div>
Assuming this is the case, the replacement you want is so simple you don't even need a regex: find all href attributes beginning "/", and add the domain part (I'll use "http://example.com") to their beginning to make them absolute:
$scraped_html = str_replace('href="/', 'href="http://example.com/', $scraped_html);
It seems Google's URLs are structured differently these days. So it is harder to extract the referring keyword from them. Here is an example:
http://www.google.co.uk/search?q=jquery+post+output+46&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a#pq=jquery+post+output+46&hl=en&cp=30&gs_id=1v&xhr=t&q=jquery+post+output+php+not+running&pf=p&sclient=psy-ab&client=firefox-a&hs=8N5&rls=org.mozilla:en-US%3Aofficial&source=hp&pbx=1&oq=jquery+post+output+php+not+run&aq=0w&aqi=q-w1&aql=&gs_sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=bdeb326aa44b07c5&biw=1280&bih=875
The search I performed was actually "jquery post output php not running", so the first 'q=' does not contain the full search. The second one does. I'd like to write a script that always extracts the last 'q=', but I'm not sure if Google's URL's always have the full search last. Anyone had any experience with this.
You can accomplish this using parse_url(), parse_str(), and urldecode(), where $str is the refer string:
$fragment = parse_url($str, PHP_URL_FRAGMENT);
parse_str($fragment, $arr);
$query = urldecode($arr['q']); // jquery post output php not running
I'm trying to find a away to extract a site title from a URL entered into a field in PHP. For example, if the user were to enter the URL http://www.nytimes.com/2009/11/05/sports/baseball/05series.html, I would want "New York Times" or "NY Times" or something along those lines.
I know it's fairly easy to extract the title of the WINDOW... for example, the URL I linked would have the title "Yankees 7, Phillies 3 - Back on Top....", but this is exactly what I don't want.
For clarification, this is for adding sources to a quote. I want to be able to add a source to quotes without a huge page URL and not just a link that says "Source".
Can anyone help me with this? Thanks in advance.
$source = parse_url('http://www.nytimes.com/....', PHP_URL_HOST); // www.nytimes.com
There is no such thing as a "site title" , you can get
the domain name (and then the owner name)
the page's title
I see you have the meta tag "cre" with the value "The New York Times" but you won't find it everywhere
You can do one thing : extract the domain name from the URL, and then get the first page's title
"http://www.nytimes.com/" will give you "The New York Times - Breaking News, World News & Multimedia"
Build a list of URL prefixes to site names, and check for each prefix in turn from longest to shortest.
You'd surely need a lookup table mapping domains (nytimes.com) to your titles "NY Times" in which case it would be easy to do.
If you want to have a method that will work on any link from any domain, then it is a bit harder as PHP in itself is not going to be able to work out what is a uniform title as it will vary from site to site.
You can explode the URL easily enough, but how then would you be able to dissect nytimes into "NY" and "TIMES".
You may be able to find a web service that allows you to feed in a domain and get back a site title, but I do not know of one.
You are best off simply quoting the domain, trimmed like "NYTIMES.COM" as the source, or "NYTIMES".
You would want to use file_get_contents() then run a match to check the text between any <title></title> tags - that then would be your title that you display.
Using parse_url wouldn't return the actual page title.
Something like:
<?php
$x = file_get_contents("http://google.com");
preg_match("/<title>(.+?)<\/title>/", $x, $match);
echo $match[1];
?>
Use the Simple HTML DOM Parser. Here is an example:
require "simple_html_dom.php";
$url = "http://www.google.com";
$html = file_get_html( $url );
list( $title ) = $html->find( 'title' );
echo strip_tags( $title ); // Output: "Google"
if i stored data in DB which contains urls (for example : Go thorugh this link http://www.google.com).
when i display that data in browser, i want to display that data like " Go through this link http://www.google.com ". but that url which looks like anchor link...
if you didn't get this..open google chat...send some msg to anyone like http://google.com..if u send plain text like http://google.com,but it shows with hyper link..to that url..
i want this functionality in PHP technology...how can we implement this
thanks in advance...
So, you want to convert the urls to links in php? See the first result, or answers to same question in stackoverflow.
If I understood this correctly you want to transform URLs in a text to links automatically, without going further into details a crude (very crude) regexp should do it for now:
$textWithLinks = preg_replace('#(http|ftp)s?://[^\s]+#i', '$0', $textWithUrls);
function add_href ($text) {
return preg_replace('/((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])/', '$0', $text);
}
Expression taken from http://rickyrosario.com/blog/converting-a-url-into-a-link-in-csharp-using-regular-expressions/