Monday
Nov012010

Semantic Content Available in RDF

Until now, we've offered semantic content in JSON, XML or HTML.  Today we're adding RDF to the menu.

While a new output format may not sound like a big deal, we're excited because RDF is the language of the semantic web, with a large and ever-increasing pool of users, tools, and data. By generating structured data in RDF format, linking data with existing data sets becomes much simpler. Links between data sets are the real strength of the semantic web because they make searching across diverse data sets are easy as writing a SPARQL query, as these examples demonstrate.

For now, the RDF triples we generate contain exactly the same data as the corresponding JSON output, which means our RDF data isn't automatically linked to any third-party data sources.  In the coming weeks we plan to roll out automatic entity linking to public data sets like DBpedia.

Details about the RDF triples we generate are available on the Extractiv wiki.

Friday
Oct292010

Generic Relations Provide Wide Variety of Factual Knowledge

Generic relations are now added to the set of tools by which Extractiv can mine semantics from text. These supplement typed relations, which provide a specific kind of semantic information, such as Person Age or Subsidiary Of. Generic relations, on the other hand, provide tuples which cover a broad range of semantics, similar to Subject-Verb-Object triples.   Here you can find kinds of information not contained in our predefined set of types, which answer a variety of questions.

Generic relations provide access to semantics connected to any action in natural language text.  These actions are usually expressed as verbs, but also include action nouns.

In Topeka, Kansas, Jane Smith started the company with her friend during the 1990s.

Given an action in text, such as started, generic relations can identify:
  • Who/What caused or initiated the action (Jane Smith)
  • Who/What was the action was applied to (the company)
  • Where the action took place ([in] Topeka, Kansas)
  • When the action took place (during the 1990s)
  • Why/How and other preposition-related phrases related to the action ([with] her friend)
With these relations, one can uncover a variety of factual information, such as:
  • Actions related to an entity (e.g., actions an organization has committed)
  • Entities involved in actions that occurred in a certain location (e.g., all stores that opened in Boston)
  • Entities connected to other entities through an action (e.g., people who work together)
Generic relations are currently supported on the On-Demand Platform and seen in the Viewer's relations pane or JSON output.  They will soon also be available on the Crawling Platform.  More documentation on Extractiv's Generic Relation service is available in the Extractiv wiki.

 

Monday
Oct252010

Embedding Extractiv On-Demand into Your Website or Blog

Enhance your site's content by embedding Extractiv functionality into its pages.  Whether you're the New York Times, or Jane D. Mommy Blogger, you can provide your audience with sophisticated content understanding on the data they care about. Extractiv will process the pages on your site or any URL of their choosing.   Taking advantage of this functionality is easy -- no Java programming required. Simply drop in a small HTML snippet and our servers will do the rest (pun intended).  The result is a new perspective on existing content, with named entities and relations highlighted and organized.

There are two ways you can leverage Extractiv on your site. The first is with the Get Semantics button, which will process the current page. You can embed this into the template for your blog, so it shows up with every blog post like a Tweet or Digg button. Try clicking on the one at the bottom of this blog post.

 To add, use this snippet:

<span class="extractiv-button"></span>
<script type="text/javascript"
src="http://github.com/extractiv/widgets/raw/master/extractiv-widgets.js">
</script>

The way second way you can embed Extractiv is with the Process URL bar. This will let a user process any URL of their choosing.

To add to your website, just drop in this snippet:

<div class="extractiv-restform"></div>
<script type="text/javascript"
src="http://github.com/extractiv/widgets/raw/master/extractiv-widgets.js">
</script>

For more information on embedding Extractiv into your site, including how to have it process just part of your page (like a single blog post), check out  our docs.

Friday
Oct222010

Now in Color: Visualizing Extracted Content with the HTML Viewer 

In minutes, Extractiv can convert millions of targeted web pages into structured content, adding value to your applications.  

Most of the time, the direct consumer of Extractiv output is a machine.  In this case, JSON and RDF are ideal output formats.  But sometimes, a human user might want to look at the semantic annotations being provided for a document.  For this case, we've added a new output format which is pleasing to the eye, called the HTML Viewer.  We're excited because we think this will also help Extractiv users understand what kind of value can be added to their apps.

The HTML Viewer output format is now provided as an option on the On-Demand platform. You can try it out using any of the API access methods with the output_format parameter set to html_viewer, or by processing a URL using the form below.

While the output is mostly self-explanatory, we've documented specific features of this output in our documentation.  Entities and relations are color coded in the main text pane. On the left, a tree structure shows the entities found, and below, the relations identified. Clicking on an entity or relation will jump you to that annotation in the text pane.  For some entities, there are quite a few entries listed in the tree, and these represent all of the mentions of that entity in the document.  Extractiv even resolves pronoun which "co-refer" to entities.

Not all of the structured data which Extractiv provides in the other formats are currently visualized in this inteface (e.g. topic terms, sentiment relations), so look forward to updates to the interface.  If you have ideas on what you'd like to see here, please let us know!

Friday
Oct082010

Entity Types Extracted Tops 150

Extractiv is committed to providing fine-grained semantics, beyond what has ever been available before from a content annotation service. As such, we recently rolled out 21 new entity types in several new domains: Entertainment, Computer, and Sports. This puts our list of entity types at over 150, and we're just getting started!  Here are frequent entities we encountered in a recent crawl, for some of these new types:


Entity Type Common Occurrences
Actor Tracey Morgan, Dianna Agron, Robert Lepage
Album Chinese Democracy, 808s & Heartbreak
Band U2, The Beatles, Bad Brains, Nickelback
Entertainment Award Emmy, Palme d'Or, Golden Globe
Movie The Godfather, The Social Network, Silmido
Radio Station K-Rock, CBS Radio, Radio 4
TV Network CBS, NBC, ABC, MTV
Operating System Android, Windows, Linux
Programming Lang. JavaScript, Java, Ruby
Web Browser Firefox, Internet Explorer, Opera

 

You might be thinking, how in the world will I figure out which of the 150 types I need?  Good question!  An upcoming feature called  domain packs, will associate all the entity types of a certain topic to a domain name. Then, to enable that set of entities one only has to specify the domain.  On Extractiv's job creation screen, this will look similar to the "core types" checkbox, which enables most general purpose entities.

Our question for you is: What domains are you interest in seeing?

<tr>
<td></td>
<td></td>
</tr>