Extract/scrape specific data from JSON file

Extract/scrape specific data from JSON file - php

This been bugging me for quite a few hours. I've been searching a lot and I have found a lot of information. The problem is, I'm not that good, I'm actually a beginner to the max. I'd like to achieve this with Python (if it's possible!). Maybe with JavaScript and PHP also? Let me explain.
I just found this website http://listeningroom.net and it's great. You can create/join rooms and upload tracks and listen to them together with friends.
I'd like to extract/scrape/get some specific data from a .json file.
This file contains artist, album title, track title and more. I'd like to extract just the artist, album and track title.
http://listeningroom.net/room/chillasfuck/spins.json The .json file Contains the tracks played in the past 24 hours.
I managed to scrape the whole .json file with Python after looking around, (local .json file) with the following probably not so valid code.
json_data=open('...\spins.json')
data = json.load(json_data)
pprint(data)
json_data.close()
This prints out the following:
[{u'endTime': u'1317752614105',
u'id': u'cf37894e8eaf886a0d000000',
u'length': 492330,
u'metadata': {u'album': u'Mezzanine',
u'artist': u'Massive Attack',
u'bitrate': 128000,
u'label': u'Virgin',
u'length': 17494.479054779807,
u'title': u'Group Four'},
Just a part of the print
1. I'd like to scrape it from an url (the one provided at the top)
2. Just get 'album', 'artist' and 'title'
3. Make sure it prints it as simple as possible like this:
Artist
Track title
Album
Artist
Track title
Album
4. If it's not too much, save it to a .txt file
I hope I could get some help, I really want to create this for myself, so I can check out more music!
Marvin

Python (after you loaded the json)
for elem in data:
print('{artist}\n{title}\n{album}\n'.format(**elem['metadata']))
To save in a file:
with open('the_file_name.txt','w') as f:
for elem in data:
f.write('{artist}\n{title}\n{album}\n\n'.format(**elem['metadata']))

You're already really close.
data = json.load(json_data)
is taking the JSON string and converting it to a Python object - in this case, a list of dictionaries (plus 'metadata', which is a dictionary of dictionaries).
To get this into the format that you want, you just need to loop through the items.
for song in data:
artist = song['metadata']['artist'] # This tells it where to look in the dictionary. It's looking for the dictionary item called 'metadata'. Then, looking inside that dictionary for 'artist'.
album = song['metadata'['album']
songTitle = song['metadata']['title']
print '%s\n%s\n%s\n' % (artist, album, songTitle)
Or, to print it to a file:
with open('the_file_name.txt','w') as f:
for song in data:
artist = song['metadata']['artist']
album = song['metadata'['album']
songTitle = song['metadata']['title']
f.write('%s\n%s\n%s\n' % (artist, album, songTitle))

Okay this is a bit short but the thing about json is that it translate an array into a string
eg.
array['first'] = 'hello';
array['second'] = 'there';
will become
[{u'first': u'hello', u'second': 'there'}];
after a jsonencode
run that sting throu jsondecode and you get your array back
so simply run you json file thou a decoder and then you should be able to reach your data through:
array['metadata'].album
array['metadata'].artist
...
have never used python but it should be the same.
have a look at http://www.php.net/manual/en/function.json-decode.php it might clear upp a thing or two.

For PHP you need json.decode
<?php
$json = file_get_contents($url);
$val = json_decode($json);
$room = $val[0]->metadata;
echo "Album : ".$room->album."\n";
echo "Artist : ".$room->artist."\n";
echo "Title : ".$room->title."\n";
?>
Outputs
Album : Future Sandwich
Artist : Them, Roaringtwenties
Title : Fast Acting Nite-Nite Spray With Realistic Uncle Beard
Note its a truck load of JSON data there so you'll have to iterate adequately

Related

Storing multiple artist info from API into file (PHP)

I really need help. I'm using an API to gather artist information depending on the artist name ($artist_name = $_GET['artistname']).
I need to add the API results into an array and store it into a file (NOT a database). This file will be ever growing as more and more artist entries are added to it.
Once I have an array in the file, I need to be able to read it and parse it. That way I can display the information without repeatedly using the API.
I have figured out how to add an array to a file with one entry, but how can I add more keys into the same array?
This is what I'm using now...
//ARRAY
$artist_info_location_array = array($artist_name => $location_entry);
//FILE
$artist_location_file = get_template_directory()."/Database/Artists/info-location.json";
//GET ARRAY FILE
$get_location_array[] = json_decode(file_get_contents($artist_location_file), true);
if (is_array($get_location_array)) {
if (!array_key_exists($artist_name, $get_location_array)) {
file_put_contents($artist_location_file, json_encode($artist_info_location_array));
}
}
It prints this to the file:
{"Imagine Dragons":"Las Vegas, NV, US"}
That's cool, but I need to be able to add more artists to this SAME ARRAY. So the result should look like this with another artist added:
{"Imagine Dragons":"Las Vegas, NV, US", "Adele":"London, UK"}
That shows Imagine Dragons and Adele both added to the same array.
Can someone help me "append" or add extra keys and values to the same array as they are added to the file?
Thanks.
EDIT 1 (In response to Martin):
I have a panel on the side of the page in question. This panel will show relevant information about the artist that has been searched for. Let's say you search for the artist "Adele". $artist_name would = Adele.
Lets say I'd like to store all artist locations, I would use the example I posted to store each artist location in the file called info-location.json ($artist_location_file).
So every time an artist page is loaded, the artist name and location would be added to the array in the file.
If my example doesn't make any sense, please show me an example on how to add multiple entries into ONE ARRAY. I am using an API and would like to cache this information to use instead of requesting the API on each load.
Hope this makes sense. :)

I might be misunderstanding your question, but if you just want to read in a json file, add an associative array key to it if it does not exist and then put it back into the json file why dont you do something like this:
if (is_array($get_location_array)) {
if (!array_key_exists($artist_name, $get_location_array)) {
$get_location_array[$artist_name] = $location;
file_put_contents($artist_location_file, json_encode($artist_info_location_array));
}
}
file_put_contents will overwrite an existing file (pretty sure). But your best option is to use a database. If you can't do that, then I suggest to prevent writing to the file while you are doing this I suggest you use fopen, flock, and fwrite and then fclose

Excel 2010 Download CSV with dynamic name from secure website

I am wanting to login to a site, navigate to a page, then download the .CSV file that will always end with a dynamic string due to it being 'custom'.
I have tried to access the site by recording a macro. However, as the data is not in a table the macro recorder is not able to pick up the actual address of the .csv file.
The display text is always:
Results [link]Click to Download[/link]
The html values are always:
<td class="smallText"><b>Results</b> <a href="vendor_report.php?report=custom [insert extremely long string here] ><u>Click to Download</u></a></td>
Without using a table, is there a way to get to this .csv & save it to my PC?
I am aware that the use of <td> denotes it is part of a table, but it is definitely not picking it up, I've gone through the site using the macro recorder and it's not picking up the inner contents from the page.
https://[domain].php?vf=vieworders
I had also thought to navigate to the site page, highlight the text, copy & paste to a spare sheet in my book, then use some code L42 previously wrote here (below) however I can't even get the copy & paste to work correctly.
For Each hlink In ThisWorkbook.Sheets("NameOfYourSheet").Hyperlinks
Set wb = Workbooks.Open(hlink.Address)
wb.SaveAs saveloc & hlink.Range.Offset(0,1).Value & ".xlsx"
wb.Close True
Set wb = Nothing
Next
Please advise. Thank you in advance.
UPDATE
I have found which table this is hiding in, Table 2. It is however in the midst of a lot of other text.
When I have copied & pasted the table contents to my sheet, I have problems getting the link to show as it's HTML value so I can then use that with my 2nd option (open links from spreadsheet).
This could be an issue with the original Get Data code I am using.
This is how it looks. The cells either side are filled, as well as that huge chunk of (blanked out) text in B20
Could Regex be of use here??

You could try using the XMLHTTP object with a stream:
Sub SO()
Dim objStream As Object, strURL As String
Set objStream = CreateObject("ADODB.Stream")
strURL = "vendor_report.php?report=custom [insert extremely long string here]"
With CreateObject("Microsoft.XMLHTTP")
.Open "GET", strURL, False
.Send
If .Status = 200 Then
objStream.Open
objStream.Type = 1
objStream.Write .ResponseBody
objStream.SaveToFile "C:\users\bloggsj\output.csv", 2
objStream.Close
End If
End With
Set objStream = Nothing
End Sub
Change the save path as required.

How can I make this PHP code more efficient? (Editing a line in a CSV file)

This code works, but I just hacked it together with my limited knowledge of PHP and I'm sure there's a more elegant and efficient way to go about it. If you'd be so kind as to point out how I can improve, that would be great!
So I have a CSV file, structured like so:
Code Class Value Status Date Created Date Redeemed
========================================================================
a51f3g45 gold 50 valid 2012-08-20
4f6a2984 silver 200 redeemed 2012-08-23 2012-08-27
gf3eb54b gold 150 valid 2012-08-30
etc...
The user fills out a form to change the Class, Value, and Status fields of a given line. I cobbled together the following code to replace the old values with the new ones:
$file = 'codes.csv';
$old_csv_string = file_get_contents($file);
preg_match('/('.$_POST['code'].',.*,.*,.*,.*,.*)\n/',$old_csv_string,$matches);
$old_row = $matches[1];
preg_match('/'.$_POST['code'].',(.*,.*,.*),.*,.*\n/',$old_csv_string,$matches_part);
$old_row_part = $matches_part[1];
$new_row_part = $_POST['class'].",".$_POST['value'].",".$_POST['status'];
$new_row = str_replace($old_row_part,$new_row_part,$old_row);
$new_csv_string = str_replace($old_row,$new_row,$old_csv_string);
file_put_contents($file,$new_csv_string);
So can I do better than 10 lines of code? Any advice would be greatly appreciated :)
Note: I tried using fgetcsv, but I couldn't figure out how to find the unique code within a 2D array, then replace its siblings.

Why are you doing this ?
I think you should store the data in a SQL table.
Each time user update data, do it in the table.
If you want the CSV to be downloadable at any moment. Use a .htaccess to redirect your.csv to csv_generator.php only if your.csv does not exist.
csv_generator.php will regenerate the whole csv if it does not exist, save it on hard drive for later use, and send it with correct mime/type in header (so it's transparent for user). User don't see he is requesting a php page.
Then you need to delete the csv on hard drive each time someone update the data (so it will be regenerated on next request)
I think this is the way to have an always ready to download csv online.
Do you know google doc does this ? Users can change data in a spreadsheet wich is available to download as a csv from a url (you need to publish this spreadsheet as a csv file).

Try using split like that for each line:
list($code, $class, $value, $status, $created, $redeemed) = split(",", $line, 6) ;
Thus you will have each field in separate variable.
Of course you need to take care of the first row in case you don't want to copy header.

How to convert from one by one to a mass output using PHP

I am trying to find a way how to get the twitter ID from a list of users. I found the following link that works pretty well, you just replace ABC with the username you want
http://www.idfromuser.com/getID.php?username=ABC
What you get is the id from that user. Using "View Page Source" there is only the ID, no format or stuff.
What I want to do and do not know how, is how can I load a list of usernames and get/save the IDs. No one by one.
Thank you. I have a knowledge in PHP
Update
I have a list of usernames saved in a .txt file. The output with the IDs may be printed or screen or saved in a txt. I know that this is a solution with a get file contents but I need some guide/example

Why don't you use the twitter API?
This returns the user IDalong with other details
GET https://api.twitter.com/1/users/lookup.jsonscreen_name=ABC&include_entities=true
Return up to 100 users worth of extended information, specified by
either ID, screen name, or combination of the two. The author's most
recent status (if the authenticating user has permission) will be
returned inline.
It's pretty powerful, with only two requests you can get as much as 200 user ID's max.
https://dev.twitter.com/docs/api/1/get/users/lookup
It is better you concatenate (comma-separated) as much as 100 user ID's to the lookup URL, because it would return a max of 100 for each query. Unauthenticated users are rate limited so:
Example Code:
$lookupString = ""; //usernames seperated by new line character in text file
foreach ($notf as $key => $value) {
$lookupString .= $value.","; //concatenating, comma separated.
}
$lookupStringUrl = "http://api.twitter.com/1/users/lookup.json?user_id=".$lookupString;
$namejson = json_decode(file_get_contents($lookupStringUrl));
foreach ($namejson as $key => $value) {
echo $value->id."\n";
}

PHPFlickr script...could this be cleaner/leaner?

OK, here's my dilemma:
I've read all over about how many guys want to be able to display a set of images from Flickr using PHPFlickr, but lament on how the API for PhotoSets does not put individual photo descriptions. Some have tried to set up their PHP so it will pull the description on each photo as the script assembles the gallery on the page. However, the method has shown how slow and inefficient it can be.
I caught an idea elsewhere of creating a string of comma separated values with the photo ID and the description. I'd store it on the MySQL database and then call upon it when I have my script assemble the gallery on the page. I'd use explode to create an array of the photo ID and its description, then call on that to fill in the gaps...thus less API calls and a faster page.
So in the back-end admin, I have a form where I set up the information for the gallery, and I hand a Set ID. The script would then go through and make this string of separated values ("|~|" as a separation). Here's what I came up with:
include("phpFlickr.php");
$f = new phpFlickr("< api >");
$descArray = "";
// This will create an Array of Photo ID from the Set ID.
// $setFeed is the set ID brought in from the form.
$photos = $f->photosets_getPhotos($setFeed);
foreach ($photos['photoset']['photo'] as $photo) {
$returnDesc = array();
$photoID = $photo['id'];
$rsp = $f->photos_getInfo($photoID);
foreach ($rsp as $pic) {
$returnDesc[] = htmlspecialchars($pic['description'], ENT_QUOTES);
}
$descArray .= $photoID."|~|".$returnDesc[0]."|~|";
}
The string $descArray would then be placed in the MySQL string that puts it into the database with other information brought in from the form.
My first question is was I correct in using a second foreach loop to get those descriptions? I tried following other examples all over the net that didn't use that, but they never worked. When I brought on the second foreach, then it worked. Should I have done something else?
I noticed the data returned would be two entries. One being the description, and the other just an "o"...hence the array $returnDesc so I could just get the one string I wanted and not the other.
Second question is if I made this too complicated or not. I like to try to learn to write cleaner/leaner code, and was looking for opinions.
Suggestions on improvement are welcome. Thank you in advance.

I'm not 100% sure as I've just browsed the source for phpFlickr, and looked the the Flickr API for the getInfo() call. But let me have a go anyway :)
First off, it looks like you shouldn't need that loop, like you mention. What does the output of print_r($rsp); look like? It could be that $rsp is an array with 1 element, in which case you could ditch the inner loop and replace it with something like $pic = $rsp[0]; $desc = $pic['description'];
Also, I'd create a new "description" column in your database table (that has the photo id as the primary key), and store the description in their on its own. Parsing db fields like that is a bit of a nightmare. Lastly, you might want to force htmlspecialchars to work in UTF8 mode, cause I don't think it does by default. From memory, the third parameter is the content encoding.
edit: doesn't phpFlickr have its own caching system? Why not use that and make the cache size massive? Seems like you might be re-inventing the wheel here... maybe all you need to do is increase the cache size, and make a getDescription function:
function getDescription ($id)
{
$rsp = $phpFlickr->photos_getInfo ($id);
$pic = $rsp[0];
return $pic['description'];
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.