Big Data/Analytics Zone is brought to you in partnership with:

I'm in my mid forties and living in East Devon, UK. I enjoy explaining technology without all the buzz words and jargon. Col is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Get Real Data from the Semantic Web

01.24.2013
| 14041 views |
  • submit to reddit

Semantic Web this, Semantic Web that, what actual use is the Semantic Web in the real world? I mean how can you actually use it?

If you haven't heard the term "Semantic Web" over the last couple of years then you must have been in... well somewhere without this interweb they're all talking about.

Basically, by using metadata (see RDF), disparate bits of data floating around the web can be joined up. In otherwords they stop being disparate. Better than that, theoretically you can query the connections between the data and get lots of lovely information back. This last bit is done via SPARQL, and yes, the QL does stand for Query Language.

I say theoretically because in reality it's a bit of a pain. I may be an intelligent agentcapable of finding linked bits of data through the web, but how exactly would you do that in python.

It is possible to use rdflib to find information, but it's very long winded. It's much easier to use SPARQLWrapper andin fact in the simple example below, I've used a SPARQLWrapperWrapper to make asking for lots of similarly sourced data, in this case DBPedia, even easier.

from SPARQLWrapper import SPARQLWrapper, JSON
 
class SparqlEndpoint(object):
 
    def __init__(self, endpoint, prefixes={}):
        self.sparql = SPARQLWrapper(endpoint)
        self.prefixes = {
            "dbpedia-owl": "http://dbpedia.org/ontology/",
            "owl": "http://www.w3.org/2002/07/owl#",
            "xsd": "http://www.w3.org/2001/XMLSchema#",
            "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
            "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
            "foaf": "http://xmlns.com/foaf/0.1/",
            "dc": "http://purl.org/dc/elements/1.1/",
            "dbpedia2": "http://dbpedia.org/property/",
            "dbpedia": "http://dbpedia.org/",
            "skos": "http://www.w3.org/2004/02/skos/core#",
            "foaf": "http://xmlns.com/foaf/0.1/",
        }
        self.prefixes.update(prefixes)
        self.sparql.setReturnFormat(JSON)
 
    def query(self, q):
        lines = ["PREFIX %s: <%s>" % (k, r) for k, r in self.prefixes.iteritems()]
        lines.extend(q.split("\n"))
        query = "\n".join(lines)
        print query
        self.sparql.setQuery(query)
        results = self.sparql.query().convert()
        return results["results"]["bindings"]
 
 
class DBpediaEndpoint(SparqlEndpoint):
    
    def __init__(self, prefixes = {}):
        endpoint = "http://dbpedia.org/sparql"
        super(DBpediaEndpoint, self).__init__(endpoint, prefixes)

To use this try importing the DBpediaEndpoint and feeding it some SPARQL:

#!/usr/bin/env python
 
import sys
from sparql import DBpediaEndpoint
    
def main ():
    s = DBpediaEndpoint()
    resource_uri = "http://dbpedia.org/resource/Foobar"
    
    results = s.query("""
        SELECT ?o
        WHERE { <%s> dbpedia-owl:abstract ?o .
        FILTER(langMatches(lang(?o), "EN")) }
    """ % resource_uri)
    abstract = results[0]["o"]["value"]
    print abstract
 
    
if __name__ == '__main__':
    try:
        main()
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e

Your homework is - How do you identify the resource_uri in the first place?

That's for another evening.

Published at DZone with permission of Col Wilson, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)