If we have a link to another OneNote page in the HTML content:
<a href="onenote:SectionB.one#Note1§ion-id={<section-id>}&page-id={<page-id>}&end&base-path=https://<path>"
... before I write a parsing routine to extract that link, I thought I'd ask if I'd overlooked anything in the OneNote API to make this easier.
===========================================================================
[EDIT] Well, I've written my routine to extract the page-id of the linked note, but that page-id turns out to be quite different from the page-id that's returned as a property (id) of the linked note itself - and it doesn't work :(
Here's an example:
(1) page-id extracted from link: A8CECE6F-6AD8-4680-9773-6C01E96C91D0
(2) page-id as property of note:
0-5f49903893f048d0a3b1893ef004411f!1-240BD74C83900C17!124435
Vastly different, as you see. Accessing the page content via:
../pages/{page-id}/content
... for (1) returns nothing
... for (2) returns the full page content.
(The section-ids returned by both methods are also entirely different.)
So, how can I extract from the link a page-id that works?
Unfortunately, the OneNote API currently does not support identifying links to other OneNote pages in page content. Links in OneNote can be links to anything: websites, other OneNote pages/sections/notebooks, network shares...
The API does support getting links to pages by using
GET ~/pages
GET ~/sections/id/pages
The page metadata model contains a links object with the clientUrl and the webUrl.
Editing after your question update:
You're right - the id in the link does not correspond to the id of the OneNote API. You can however compare the id in the link with the id in the OneNoteClientUrl exposed in the API. Here's an example of the response of a
GET ~/sections/id/pages
GET ~/pages
{
"title": "Created from WAC",
"createdByAppId": "",
"links": {
"oneNoteClientUrl": {
"href": "onenote:https://d.docs.live.net/29056cf89bb2d216/Documents/TestingNotification/Harrie%27s%20Section.one#Created%20from%20WAC§ion-id=49b630fa-26cd-43fa-9c45-5c62d547ee3d&page-id=a60de930-0b03-4527-bf54-09f3b61d8838&end"
},
"oneNoteWebUrl": {
"href": "https://onedrive.live.com/redir.aspx?cid=29056cf89bb2d216&page=edit&resid=29056CF89BB2D216!156&parId=29056CF89BB2D216!105&wd=target%28Harrie%27s%20Section.one%7C49b630fa-26cd-43fa-9c45-5c62d547ee3d%2FCreated%20from%20WAC%7Ca60de930-0b03-4527-bf54-09f3b61d8838%2F%29"
}
},
"contentUrl": "https://www.onenote.com/api/v1.0/me/notes/pages/0-a50842a9873945379f3d891a7420aa39!14-29056CF89BB2D216!162/content",
"thumbnailUrl": "https://www.onenote.com/api/v1.0/me/notes/pages/0-a50842a9873945379f3d891a7420aa39!14-29056CF89BB2D216!162/thumbnail",
"lastModifiedTime": "2016-03-28T21:36:22Z",
"id": "0-a50842a9873945379f3d891a7420aa39!14-29056CF89BB2D216!162",
"self": "https://www.onenote.com/api/v1.0/me/notes/pages/0-a50842a9873945379f3d891a7420aa39!14-29056CF89BB2D216!162",
"createdTime": "2016-03-24T20:38:16Z",
"parentSection#odata.context": "https://www.onenote.com/api/v1.0/$metadata#me/notes/pages('0-a50842a9873945379f3d891a7420aa39%2114-29056CF89BB2D216%21162')/parentSection(id,name,self)/$entity",
"parentSection": {
"id": "0-29056CF89BB2D216!162",
"name": "Harrie's Section",
"self": "https://www.onenote.com/api/v1.0/me/notes/sections/0-29056CF89BB2D216!162"
}
}
You can also filter server side (if you want to save yourself from paging and regex's ;) ) for id's in the links by using:
GET ~/pages?$filter=contains(links/oneNoteClientUrl/href,'a60de930-0b03-4527-bf54-09f3b61d8838')
Related
I want to use the existing fields from a server template over another document.
At first I tried attaching the document at the same level as inline/server.
If I have the signer defined it gives me a 400 error, if I leave it off (did by accident) it completely wipes out the fields and shows the attached document.
Second I tried attaching the document to the inline template but that results in the attached document not appearing, it just operates like normal.
update
After adding additional debugging and research I now know that attaching it to the inline template was incorrect. After adding debug to read the 400 response I am getting this error:
"The DocumentId specified in the tab element does not refer to a document in this envelope. Tab refers to DocumentId 32475214 which is not present."
DocumentId is being set to 1 which is apparently wrong.
Which led me to this question on SO. In which a comment mentions that the ID kicked back from the 400 should be used.
After I hard coded this ID I see the replacement operation is a success!
However I now need to find a way find and to plug that value in programatically.
Detail
I am using the DocuSign php sdk to help me build the data structure and access the api.
Use the listTemplateDocuments API to retrieve the documentId for the template.
The documentId retrieved in the above step should be used in the CompositeTemplate of CreateEnvelope request
{
"emailSubject": "Tabs should remain from the Server Template",
"status": "sent",
"compositeTemplates": [
{
"document": {
"documentId": "<document Id>", //Use the documentId retrieved using the listTemplateDocuments api
"name": "Replaced Document",
"fileExtension": "txt",
"documentBase64": "RG9jIFRXTyBUV08gVFdP"
},
"serverTemplates": [
{
"sequence": "1",
"templateId": "<Server Template Id Here>"
}
]
}
]
}
I am working on retrieving some particular bio details of a person from a Wikipedia page of that person through Wikipedia's web API.
I need to retrieve the bio information box of a person.
I found how to retrieve the content box , introduction paragraph and all. The below URL is used to retrieve the first introduction para of the wiki web page.
https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro=&explaintext=&titles=Sachin_Tendulkar
But I am stuck with getting the above bio information box through wiki web API, so that I could extract the specific details I want.
Is it possible to get a single item of information like only the full name or only the date of birth through a single query (instead of getting the whole information and extracting the details from it)?
Simple: you must not extract biographical data from Wikipedia directly, but from its structured data counterpart, Wikidata. See https://www.wikidata.org/wiki/Wikidata:Data_access for how.
In your example: date of birth is P569; the query is https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q42&property=P569
{
"claims": {
"P569": [
{
"id": "q42$D8404CDA-25E4-4334-AF13-A3290BCD9C0F",
"mainsnak": {
"snaktype": "value",
"property": "P569",
"datatype": "time",
"datavalue": {
"value": {
"time": "+1952-03-11T00:00:00Z",
"timezone": 0,
"before": 0,
"after": 0,
"precision": 11,
"calendarmodel": "http://www.wikidata.org/entity/Q1985727"
},
"type": "time"
}
},
etc.
I'm currently trying to find all the pages where images/media from a particular category are being used on Wikimedia Commons.
Using the API, I can list all the images with no problem, but I'm struggling to make the query add in all the pages where the items are used.
Here is an example category with only two media images
https://commons.wikimedia.org/wiki/Category:Automobiles
Here is the API call I am using
https://commons.wikimedia.org/w/api.php?action=query&prop=images&format=json&generator=categorymembers&gcmtitle=Category%3AAutomobiles&gcmprop=title&gcmnamespace=6&gcmlimit=200&gcmsort=sortkey
The long term aim is to find all the pages the images from our collections appear on and then get all the tags from those pages about the images. We can then use this to enhance our archive of information about those images and hopefully used linked data to find relevant images we may not know about from DBpedia.
I might have to do two queries, first get the images then request info about each page, but I was hoping to do it all in one call.
Assuming that you don't need to recurse into subcategories, you can just use a prop=globalusage query with generator=categorymembers, e.g. like this:
https://commons.wikimedia.org/w/api.php?action=query&prop=globalusage&generator=categorymembers&gcmtitle=Category:Images_from_the_German_Federal_Archive&gcmtype=file&gcmlimit=200&continue=
The output, in JSON format, will looks something like this:
// ...snip...
"6197351": {
"pageid": 6197351,
"ns": 6,
"title": "File:-Bundesarchiv Bild 183-1987-1225-004, Schwerin, Thronsaal-demo.jpg",
"globalusage": [
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/M\u00e4rz",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/M%C3%A4rz"
}
]
},
"6428927": {
"pageid": 6428927,
"ns": 6,
"title": "File:-Fernsehstudio-Journalistengespraech-crop.jpg",
"globalusage": [
{
"title": "Kurt_von_Gleichen-Ru\u00dfwurm",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Kurt_von_Gleichen-Ru%C3%9Fwurm"
},
{
"title": "Wikipedia:Fotowerkstatt/Archiv/2009/April",
"wiki": "de.wikipedia.org",
"url": "https://de.wikipedia.org/wiki/Wikipedia:Fotowerkstatt/Archiv/2009/April"
}
]
},
// ...snip...
Note that you will very likely have to deal with query continuations, since there may easily be more results than MediaWiki will return in a single request. See the linked page for more information on handling those (or just use an MW API client that handles them for you).
I don't understand your use case ("our collections"?) so I don't know why you want to use the API directly, but if you want to recurse in categories you're going to do a lot of wheel reinvention.
Most people use the tools made by Magnus Manske, creator of MediaWiki: in this case it's GLAMourous. Example with 3 levels of recursion (finds 186k images, 114k usages): https://tools.wmflabs.org/glamtools/glamorous.php?doit=1&category=Automobiles&use_globalusage=1&depth=3
Results can also be downloaded in XML format, so it's machine-readable.
I have this data management panel of IP addresses, which belong to organization and have users responsible for it.
Now I have the route /api/ip and /api/ip/{id} to get all or specific IP. The format of one resource is:
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1"
}
Now when I choose the IP, I want to show IP information, also the organization information it belongs to and the users, that are responsible for it, information in one page.
Is it good idea to return the following data format, while requiring /api/ip/{id}:
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1",
"organization": { /* organization data */ },
"users": { /* users information */ }
}
This way I get all the information I need in one request, but is it still RESTful API?
Or should I make 2 more api routes like /api/ip/{id}/organization and /api/ip/{id}/users
and get all the data I need in 3 separate requests?
If not, what would be the appropriate way of doing this?
I would do the last one, using Hateoas, which allows you to link between the resources. There is a really great bundle for that called the BazingaHateoasBundle. The result will then be something like:
/api/ip/127.0.0.1
{
"ip": "200.0.0.0",
"mask": 32,
"broadcast": "200.0.0.1",
"_links": {
"organization": "/api/ip/127.0.0.1/organization",
"users": "/api/ip/127.0.0.1/users"
}
}
It is perfectly okay to have nested resources. You can expand them the way you showed, or you can collapse them by adding links (with the proper link relation or RDF metadata). I suggest you to use a standard or at least documented hypermedia type, e.g. JSON-LD + Hydra, or HAL+JSON.
I'v been having trouble scraping the following website content: http://www.qe.com.qa/wp/mw/MarketWatch.php
using file_get_contents() never gets me the right tag. I would like to scrape the content of the following tag: td aria-describedby="grid_OfferPrice"
is the website protected from scraping? because when I try the same method with diffrent websites it works. if yes, then what is a good work around for this ?
The way to see if scraping works is to output what file_get_contents returns. If you have nothing back or an error then maybe your IP has been restricted by their admin.
If it returns their source code then it's working but maybe the tag you're looking for has not been found.
Eliminate failures in your process by answering these questions first, one at a time.
I viewed their source code and the aria attribute you are searching for doesn't appear to exist.
It seems they load the data on that page from another source which is at this page (http://www.qe.com.qa/wp/mw/bg/ReadData.php?Types=SELECTED&iType=SO&dummy=1401401577192&_search=false&nd=1401401577279&rows=100&page=1&sidx=&sord=asc)
If you want the data from that page then use file_get_contents on it directly.
The data from that page in an online json editor gives you a neat way of quickly seeing whether this is a useful solution for you.
A sample of that data is listed below:
{
"total": "140",
"page": "1",
"records": "140",
"rows": [
{
"Topic": "QNBK/NM",
"Symbol": "QNBK",
"CompanyEN": "QNB",
"CompanyAR": "QNB",
"Trend": "-",
"StateEN": "Tradeable",
"StateAR": "المتداوله",
"CatEN": "Listed Companies",
"CatAR": "الشركات المدرجة",
"SectorEN": "Banks & Financial Services",
"SectorAR": "البنوك والخدمات المالية",
"ShariahEN": "N/A",
"ShariahAR": "N/A",
"OfferVolume": "7503",
"OfferPrice": "184.00",
"BidPrice": "182.00",
"BidVolume": "15807",
"OpenPrice": "190.0",
"High": "191.7",
"Low": "181.0",
"IMP": "182.0",
"LastPrice": "182.0",
"PrevClosing": "187.0",
"Change": "-5.0",
"PercentChange": "-2.6737",
"Trades": "980",
"Volume": "2588830",
"W52High": "199.0",
"W52Low": "145.0",
"Value": "481813446.4"
},
{
"Topic": "QIBK/NM",
"Symbol": "QIBK",
"CompanyEN": "Qatar Islamic Bank",
"CompanyAR": "المصرف ",
"Trend": "+",
"StateEN": ...
Make sure you read this link about 'scraping' etiquette.
Link: http://simplehtmldom.sourceforge.net/
$dom = new DOMDocument();
$dom->loadHTML(file_get_contents("EXAMPLE.COM");
$items = $dom->getElementsByTagName("YOUR TAG");
This class allows you to search HTML code for elements. I have used it a few times before and it is by far the best solution I have found for your issue.