« Entity Types Extracted Tops 150 | Main | On-Demand Platform for Semantic Processing by REST API »
Wednesday
Sep292010

Spotlight: Sports Entities

Today we're running the first of our spotlights on the kinds of entities that Extractiv can provide semantics about.  We've recently launched a set of sports entity types.  We were going to call it "Better Know an Entity Type", but Stephen Colbert has that one taken. 

We created a job with 8 sports types that crawled and processed 25,000 news articles.  Then, we used the http://github.com/extractiv/ExtractivPublicCode to find the most frequent entities which are shown below (actual entity types shortened for clarity).

Entity Type Most Common Entity Occurrences
Athlete Michael Vick 30
Event Ryder Cup 518
Owner Steinbrenner 6
Position Quarterback 121
Sport Football 356
Sports League NFL 454
Stadium Soldier Field 13
Team Redskins 127

 

While frequencies varied, one thing about this web crawl was clear: football is king.  The most commonly used Stadium, Position, League, Sport, Athlete, and Team were all from football.  Owners were not mentioned all that frequently, but the recently deceased George Steinbrenner is probably the most famous -- or at least the only one who appeared on Seinfeld.  Finally, the Ryder Cup is a golf Tournament that starts this weekend, so it is has been mentioned a lot recently.

I wonder which entities are most important in the pages which are important to you?  Stay tuned to find out what other domains Extractiv can understand.

Athlete Michael Vick 30

PrintView Printer Friendly Version

EmailEmail Article to Friend

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>