Today we're running the first of our spotlights on the kinds of entities that Extractiv can provide semantics about. We've recently launched a set of sports entity types. We were going to call it "Better Know an Entity Type", but Stephen Colbert has that one taken.
We created a job with 8 sports types that crawled and processed 25,000 news articles. Then, we used the http://github.com/extractiv/ExtractivPublicCode to find the most frequent entities which are shown below (actual entity types shortened for clarity).
|Entity Type||Most Common Entity||Occurrences|
While frequencies varied, one thing about this web crawl was clear: football is king. The most commonly used Stadium, Position, League, Sport, Athlete, and Team were all from football. Owners were not mentioned all that frequently, but the recently deceased George Steinbrenner is probably the most famous -- or at least the only one who appeared on Seinfeld. Finally, the Ryder Cup is a golf Tournament that starts this weekend, so it is has been mentioned a lot recently.
I wonder which entities are most important in the pages which are important to you? Stay tuned to find out what other domains Extractiv can understand.