How do I parse visitors by country info from alexa? - php

if you search alexa with any URL's you will get a detailed traffic information of the same.
what I am looking into is I would like to parse Visitors by Country info from alexa.
example for google.com
url is - http://www.alexa.com/siteinfo/google.com.
on the Audience tab you can see:
Visitors by Country for Google.com
United States 35.0%
India 8.8%
China 4.1%
Germany 3.4%
United Kingdom 3.2%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Brazil 3.2%
Iran 2.8%
Japan 2.1%
Russia 2.0%
Italy 1.9%
Indonesia 1.7% //etc.
How can I get only these info from alexa.com?? I have tried with preg_match function but it is very difficult in this case....

If you don't want to use DOM and getElementById which is the most elegant solution in this case, you can try regexp:
$data = file_get_contents('http://www.alexa.com/siteinfo/google.com');
preg_match_all(
'/<a href="\/topsites\/countries\/(.*)">(.*)<\/a>/mU',
$data,
$result,
PREG_SET_ORDER
);
The DOM solution looks like:
$doc = new DomDocument;
$doc->loadHTMLFile('http://www.alexa.com/siteinfo/google.com');
$data = $doc->getElementById('visitors-by-country');
$my_data = $data->getElementsByTagName('div');
$countries = array();
foreach ($my_data as $node)
{
foreach($node->getElementsByTagName('a') as $href)
{
preg_match('/([0-9\.\%]+)/',$node->nodeValue, $match);
$countries[trim($href->nodeValue)] = $match[0];
}
}
var_dump($countries);

Related

How do you uppercase only certain parts of a single string that's in format "Town comma Initials"?

I have a single location field where people can enter whatever they want, but generally they will enter something in the format of "Town, Initials". So for example, these entries...
New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND
would ideally become...
New York, NY
Columbia, SC
Charleston
Washington, DC
Bismarck, ND
Obviously I can use ucfirst() on the string to handle the first character, but these are things I'm not sure how to do (if they can be done at all)...
Capitalizing everything after a comma
Lowercasing everything before the comma (aside from the first character)
Is this easily doable or do I need to use some sort of regex function?
You could simply chop it up and fix it.
<?php
$geo = 'New york, Ny
columbia, sc
charleston
washington, DC
BISMARCK, ND';
$geo = explode(PHP_EOL, $geo);
foreach ($geo as $str) {
// chop
$str = explode(',', $str);
// fix
echo
(!empty($str[0]) ? ucwords(strtolower(trim($str[0]))) : null).
(!empty($str[1]) ? ', '.strtoupper(trim($str[1])) : null).PHP_EOL;
}
https://3v4l.org/ojl2M
Though you should not trust the user to enter the correct format. Instead find a huge list of all states and auto complete them in. Perhaps something like https://gist.github.com/maxrice/2776900 - then validate against it.

Querying Content from Wikipedia

I am trying to fetch the first para of the Wikipedia article using the following script. When i query with multiple words it doesn't work.
<?php
$query = urlencode($_GET['query']);
$url = "http://en.wikipedia.org/w/api.php?action=parse&page=$query&format=json&prop=text&section=0";
$ch = curl_init($url);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server; use YOUR user agent with YOUR contact information. (otherwise your IP might get blocked)
$c = curl_exec($ch);
$json = json_decode($c);
$content = $json->{'parse'}->{'text'}->{'*'}; // get the main text content of the query (it's parsed HTML)
// pattern for first match of a paragraph
$pattern = '#<p>(.*)</p>#Us'; // http://www.phpbuilder.com/board/showthread.php?t=10352690
if(preg_match($pattern, $content, $matches))
{
// print $matches[0]; // content of the first paragraph (including wrapping <p> tag)
$cont = strip_tags($matches[1]); // Content of the first paragraph without the HTML tags.
}
$pattern = '/\[([^\[\]]|(?R))*]|\(([^()]|(?R))*\)/';
echo $my = preg_replace($pattern, '', $cont);
?>
Demo 1: Bangalore
Demo 2: Los Angeles
Is there anyway to query for the results from Wikipedia and by default select the first Result.
You need to url encode your query string before passing it to curl.
<?php $query = urlencode($_GET['query']); ?>
EDIT: I tried your code and it worked by replacing whitespaces by the character '+'.
The url encode did not work because it replaced them by '%20'.
Try this
$query = str_replace(' ', '+', $_GET['query']);
Here is the output I get with Los Angeles and New Delhi
iMac-de-Valentin:so valentin$ php so.php
Los Angeles , officially the City of Los Angeles, often known by its initials L.A., is the most populous city in the U.S. state of California and the second-most populous in the United States, after New York City, with a population at the 2010 United States Census of 3,792,621. It has a land area of 469 square miles , and is located in Southern California.
iMac-de-Valentin:so valentin$ php so.php
New Delhi i/ˈnjuː dɛli/ is the capital of India and seat of the executive, legislative, and judiciary branches of the Government of India. It is also the centre of the Government of the National Capital Territory of Delhi. New Delhi is situated within the metropolis of Delhi and is one of the eleven districts of Delhi National Capital Territory.
iMac-de-Valentin:so valentin$

PHP line parsed into separate objects

I have a line of code in my wordpress widget that outputs from an RSS feed:
<?php echo $entry->title ?>
and when displayed it looks like:
$220,000 :: 504 Freemason St, Unit 2B, Norfolk VA, 23510
or
$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454
What is the easiest way to break this up into different objects?
For example, I'd like to have the price, street name, and city state zip in different objects. The problem is that some of the addresses have unit numbers and it's complicating things. Below is an example of how I would like it to work:
<?php echo $entry->price ?>
<?php echo $entry->street ?>
<?php echo $entry->citystatezip ?>
$220,000
504 Freemason St, Unit 2B
Norfolk VA, 23510
or
$274,900
1268 Bells Road
Virginia Beach VA, 23454
Here is a very crude regex that seems able to parse your string. I'm not the best with regexes, but it seems to work.
/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/
Use this with preg_match; the 1st group is the price, the 2nd is the address, and 3rd is the city/state/zip.
Example:
<?php
$ptn = '/^(\$(?:\d{1,3},?)*) :: (\d* [\w\s,\d]*), ([\w\s]* \w{2}, \d{5})$/';
if(preg_match($ptn, $entry->title, $match) === 1){
$price = $match[1];
$street = $match[2];
$citystatezip = $match[3];
}
What you need is a regular expression , check http://php.net/manual/en/function.preg-match.php
Use f.e. array explode ( string $delimiter , string $string [, int $limit ] ) which will give you array with strings if you use correct delimiter
The code below will fill your $entry object as required:
$string = '$274,900 :: 1268 Bells Road, Virginia Beach VA, 23454';
$pricePart = explode('::', $string);
$addressPart = explode(',', $pricePart[1]);
$entry = new stdClass();
$entry->price = trim($pricePart[0]);
if ( count($addressPart) == 3 ) {
$entry->street = trim($addressPart[0]);
$entry->citystatezip = trim($addressPart[1]) . ', ' . trim($addressPart[2]);
} else {
$entry->street = trim($addressPart[0]) . ', ' . trim($addressPart[1]);
$entry->citystatezip = trim($addressPart[2]) . ', ' . trim($addressPart[3]);
}
Updated answer to handle the unit bit
Update: changed array names, I hate $array.. names.. even if its just a mockup
(Note: this code isn't the prettiest, but its ment to give a base to work on. It should be cleaned up and improved a bit)

Parsing a string after a certain string

I have a string ($source) which is containing the following data:
{"Title":"War Horse","Year":"2011","Rated":"PG-13","Released":"25 Dec 2011","Runtime":"2 h 26 min","Genre":"Drama, War","Director":"Steven Spielberg","Writer":"Lee Hall, Richard Curtis","Actors":"Jeremy Irvine, Emily Watson, David Thewlis, Benedict Cumberbatch","Plot":"Young Albert enlists to serve in World War I after his beloved horse is sold to the cavalry. Albert's hopeful journey takes him out of England and across Europe as the war rages on.","Poster":"http://ia.media-imdb.com/images/M/MV5BMTU5MjgyNDY2NV5BMl5BanBnXkFtZTcwNjExNDc1Nw##._V1_SX640.jpg","imdbRating":"7.2","imdbVotes":"39,540","imdbID":"tt1568911","Response":"True"}
I'm extracting the title, the genre, the plot and so on by using this:
foreach(str_getcsv($source) as $item) {
list($k, $v) = explode(':', $item);
$$k = str_replace('"', '', $v);
}
So far, this works very well, I'm able to use $Title, $Genre and so on. The only thing that doesn't work is the URL to the poster since I'm exploding the ':' and the URL - of course - contains ':' (after the 'http').
How can I put the poster URL into a variable?
That looks like JSON data, why not simply:
$txt = '{"Title etc.....}';
$data = json_decode($txt);
$title = $data['Title'];
$genre = $data['Genre'];
etc...
variable variables are highly ugly, and you risk compromising your code by overwriting some other variable with the contents of the JSON data.
if you REALLY insist on poluting your namespace with auto-vivified variables, you can always use extract() to pull apart the array
Use json_decode
$str = '{"Title":"War Horse","Year":"2011","Rated":"PG-13","Released":"25 Dec 2011","Runtime":"2 h 26 min","Genre":"Drama, War","Director":"Steven Spielberg","Writer":"Lee Hall, Richard Curtis","Actors":"Jeremy Irvine, Emily Watson, David Thewlis, Benedict Cumberbatch","Plot":"Young Albert enlists to serve in World War I after his beloved horse is sold to the cavalry. Albert\'s hopeful journey takes him out of England and across Europe as the war rages on.","Poster":"http://ia.media-imdb.com/images/M/MV5BMTU5MjgyNDY2NV5BMl5BanBnXkFtZTcwNjExNDc1Nw##._V1_SX640.jpg","imdbRating":"7.2","imdbVotes":"39,540","imdbID":"tt1568911","Response":"True"}';
$decode_string = json_decode($str);
print_r($decode_string);
echo $decode_string->Title;
Here is the running code Click Here
Its a json,
You should use json_decode
$str = '{"Title":"War Horse","Year":"2011","Rated":"PG-13","Released":"25 Dec 2011","Runtime":"2 h 26 min","Genre":"Drama, War","Director":"Steven Spielberg","Writer":"Lee Hall, Richard Curtis","Actors":"Jeremy Irvine, Emily Watson, David Thewlis, Benedict Cumberbatch","Plot":"Young Albert enlists to serve in World War I after his beloved horse is sold to the cavalry. Albert\'s hopeful journey takes him out of England and across Europe as the war rages on.","Poster":"http://ia.media-imdb.com/images/M/MV5BMTU5MjgyNDY2NV5BMl5BanBnXkFtZTcwNjExNDc1Nw##._V1_SX640.jpg","imdbRating":"7.2","imdbVotes":"39,540","imdbID":"tt1568911","Response":"True"}';
$arr = json_decode($str,true);
print_r($arr);
echo $arr['Title'];
echo $arr['Year'];
Notice, I have properly escaped the string.

How do i convert this content to an array?

I have a plain text file which have the list of countries as follows.
United Kingdom
United States of America
Abkhazia
Afghanistan
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Ashmore and Cartier Islands
Australia
Austria
Azerbaijan
Bahamas
Bahrain
I want to insert all the values in database for which i need to convert this to an array. i am using the following code for reading the file.
$fileContent = file_get_contents('countries.txt');
$fileContent = nl2br($fileContent);
Now i want to add ( , ) comma in the end of each line break. so that i can use explode() and convert it into an array. how do i do it?
thank you.
Not sure why you are doing nl2br
Just try
$fileContent = file_get_contents('countries.txt');
$array = explode("\n", $fileContent);
Use this function instead: file
Assuming the list you posted is actually in the following format after your nl2br() call
United Kingdom<br />
United States of America<br />
Abkhazia<br />
...
You can do
<?PHP
explode("<br />", $yourString);
// or explode("\n", $yourString); if you remove the nl2br() call
?>

Categories