Big Data/Analytics Zone is brought to you in partnership with:

I'm in my mid forties and living in East Devon, UK. I enjoy explaining technology without all the buzz words and jargon. Col is a DZone MVB and is not an employee of DZone and has posted 13 posts at DZone. You can read more from them at their website. View Full User Profile

Get Real Data from the Semantic Web - Finding Resources

01.25.2013
| 2019 views |
  • submit to reddit

In my last article, I briefly explained how to get data from a resource using python and SPARQL. This article explains how to find the resource in the first place.

Have you ever been taught how to knit? I you have, then you'll know that you are not usually taught how to cast on (or start off) on your first lesson. That's because it much easier to learn how to knit than it is to cast on.

So it is with the Semantic Web. Once you have a resource URL, it's reasonably easy to extract information linked to that resource, but finding the starting resource is a bit trickier.
So let's just recap how we might get the abstract description for London from DBpedia.

If we know the URL then that's pretty straight forward:

#!/usr/bin/env python
 
import sys
from sparql import DBpediaEndpoint
 
def main ():
    s = DBpediaEndpoint( {
            "resource": "http://dbpedia.org/resource/",
            "yago": "http://dbpedia.org/class/yago/"
        } )
 
    query = """
        SELECT ?abstract WHERE {
            resource:London dbpedia-owl:abstract ?abstract .
            FILTER(langMatches(lang(?abstract), "EN"))
        }
    """
 
    results = s.query(query)
    abstract = results[0]["abstract"]["value"]
    print abstract
 
    
if __name__ == '__main__':
    try:
        main()
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e

(If you want to follow this tutorial, then you had better copy the sparql.py file fromthere.)


RDF types for the DBpedia entry for London
If you don't however, then you'll have to search for it. According to the dbpedia entry, London is many things, including a owl:Thing, there are a lot of Things out there, probably enough to make even the DBpdia endpoint time out, so let's choose something more restrictive such as yago:Locations but not too restrictive, for example yago:BritishCapitals.

#!/usr/bin/env python
 
import sys
from sparql import DBpediaEndpoint
 
def main ():
    s = DBpediaEndpoint( {
            "resource": "http://dbpedia.org/resource/",
            "yago": "http://dbpedia.org/class/yago/"
        } )
 
   
    query = """
        SELECT ?url WHERE {
            ?subject rdf:type yago:Locations .
            ?subject foaf:page ?url .
            ?subject foaf:name ?name .
            FILTER regex(?name, "London") .
        } LIMIT 1
    """
    
    results = s.query(query)
    url = results[0]["url"]["value"]
    print url
 
    
if __name__ == '__main__':
    try:
        main()
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e

Just to be a smart ass as I finish off, you can get both at the same time by doing this, but don't forget that doing this will stress the SPARQL endpoint more than is probably necessary. Be kind.

#!/usr/bin/env python
 
import sys
from sparql import DBpediaEndpoint
 
def main ():
    s = DBpediaEndpoint( {
            "resource": "http://dbpedia.org/resource/",
            "yago": "http://dbpedia.org/class/yago/"
        } )
 
    
    query = """
        SELECT * WHERE {
            ?subject rdf:type yago:Locations .
            ?subject dbpedia-owl:abstract ?abstract .
            ?subject foaf:page ?url .
            ?subject foaf:name ?name .
            FILTER regex(?name, "London") .
            FILTER(langMatches(lang(?abstract), "EN")) .
        } LIMIT 1
    """
 
    results = s.query(query)
    url = results[0]["url"]["value"]
    print url
    abstract = results[0]["abstract"]["value"]
    print abstract
 
    
if __name__ == '__main__':
    try:
        main()
        sys.exit(0)
    except KeyboardInterrupt, e: # Ctrl-C
        raise e


Published at DZone with permission of Col Wilson, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)