Text Mining Twitter

Personally, I don’t get Twitter. I have an account (mvgilliland) for anyone interested in not hearing any tweets from me. I follow a few people and have a few followers…

Personally, I don’t get Twitter. I have an account (mvgilliland) for anyone interested in not hearing any tweets from me. I follow a few people and have a few followers (including some that aren't porn bots) — but what is the point? Does anyone really care that I’m out hanging floss on the line to dry, or that I’m stuck in the waiting room of my urologist with a prostate the size of a grapefruit?

The fact is, if someone is that interested in what I’m doing right now, it makes me kind of nervous. Do I really want people “following” me? Aren’t anti-stalking laws enacted for good reason?

Call me old school, a luddite, a 21st century puritan, even a techno-prude. Or perhaps I’m just blind to the great new opportunities for data (not just dating) that social networking provides. I have actually been swayed a little bit in this direction by my colleague in Australia, Evan Stubbs. Leveraging some code from SAS software developer Zach Marshall, Evan put together a neat little demo for customers, illustrating how SAS Text Miner can be used to identify patterns that could be used in forecasting. From Evan:

Guest Blogger: Evan Stubbs, Solution Manager for SAS Analytics

Rightly or wrongly, we seem to love telling the world what we’re doing, often even if no-one’s listening! Morgan Stanley recently published a report by a 15 year-old intern that for many, seemed to state the obvious: “On the other hand, teenagers do not use twitter … they realise that no one is viewing their profile, so their ‘tweets’ are pointless”.

Ignoring the bizarre implications that ‘only oldies use Twitter’, to me, this misses the point; it’s not about who’s listening right now, it’s about who might be listening. One of the points of talking publicly about a particular topic is the hope that other people who are also interested in that topic might just join in. For me, it’s about the chance of finding like-minded people with similar interests (whether they agree with me or not!). It’s about connecting with new people, people I may never have met otherwise. It doesn’t matter whether it’s about my passion for analytics, my fascination with my latest gadget, or my displeasure with my latest billing experience; with the growth of the Internet, there’s bound to be people out there thinking and debating about similar things.

And, that’s the clincher – the scale of these social networks can’t be underestimated. A back-of-the-napkin poll I recently did to see how big some of the sites I knew about were stunned me; out of approximately 20 sites myself or my colleagues are a part of, only two had membership levels below 22 million. That may seem like an arbitrary number, but it has quite a bit of significance for me – it’s the population of Australia, the place I live. Sites like Facebook and MySpace have over ten times the population of Australia; These aren’t just social networks, they’re almost countries in their own right!

With that level of membership, it’s not surprising that there’s a wealth of information available within them, information that changes as rapidly as the discussion does. Google Trends and Twitter Trending Topics are great to help see what people are talking about overall, but they’re not personal – they don’t always relate to what I’m interested in. And, trying to trim down the torrent of information is almost an exercise in frustration – applications like TweetDeck help targeted searching and monitoring, but they don’t solve the real problem around pattern extraction and trend analysis.

So, based on the excellent work done by Zach Marshall, one of the geniuses behind our Web Services development, I thought it’d be rather fun to use SAS to create a personalized Twitter search process that takes into account geographic information, language-based searches, and then use Enterprise Guide’s Stored Process capabilities to package it up into an installable process usable by anyone. For me, the exciting thing was how much of SAS’s functionality I was easily able to use in what amounts to such a small effort:

• SAS 9.2’s Web Services capabilities, to connect to Twitter and create the query
• SAS’s Regular Expressions parsing, to cleanse the XML documents and structure them correctly
• SAS’s XML parsing and data handling capabilities, to extract and structure data
• SAS’s Stored Process capabilities, to turn it into a reusable process that’ll deliver the results to anything (a SAS dataset, Excel, Internet Explorer …)
• SAS’s Text Mining capabilities, to extract trends and patterns of particular discussion

I spend a lot of time on planes, so one of the first things I searched for was what people were saying about some of our major airlines over the last seven days, centered approximately 100 kilometers around Sydney (where I live). The breakup was fascinating – for one, the discussion was focused around:

• 4% of all discussions: TV related discussion, namely the Australian anti-censorship video being screened on the airline and various television awards programs
• 41% of all discussions: Discussion about the airline’s lounge, posts of people in-transit and waiting for the flights / going home
• 22% of all discussions: Frequent flier points, the airline’s club, a new joint loyalty reward program
• 23% of all discussions: Work-related discussion and industry issues (e.g. A380, working at the airline)
• 8% of all discussions: Cargo price fixing

The level of interest around their newly launched joint loyalty program must be pretty gratifying for them; it’s pretty clearly a hot topic on Twitter!

For me, it’s a brilliant way to extend my network, monitor the pulse of discussion, and spend more time thinking and debating and less time clicking. For organisations who care about their customers, it might be a way to create a personal, two-way dialogue with all of their customers. Or, it might be a way to help them solve their customers’ issues as they experience and Tweet about them. Or, it might simply be a way of keeping track of what’s hot at the moment, quickly, easily, and dynamically.

In any case, I find it tremendously empowering. It’s not just that I’ve got another way of taming the deluge of information I’m increasing hit with every day; it’s also that I know that if I say something, the odds of someone hearing it who cares about similar things to me increases every day. And, SAS is right there, helping make it easier.

A great thing about working at SAS is that I’m surrounded by smart and creative colleagues like Evan and Zach (and over 10,000 others from across the globe). If you aren’t familiar with SAS, here is a recent write-up in Investor’s Business Daily. May I never have to leave SAS, or ever again have to work at a public company.

Can the results from text mining tweets be of use in forecasting? Like the use of Google Trends data in forecasting (discussed in this blog on July 10), this is an area of active research. While it is exciting to have all these new data sources, it is still to-be-determined whether they can actually improve the accuracy of our forecasts. Are you doing research in this area? If so, I invite you to share your results in a guest blogger posting on The BFD.

For more information on using SAS to analyze Twitter data, and for a sample of Evan’s code, you can contact him directly at evan.stubbs@sas.com.

Read more at http://feedproxy.google.com/~r/TheBusinessForecastingDeal/~3/d0TXXitu5pI/index.php