This article was taken from the December 2011 issue of Wired magazine. Be the first to read Wired's articles in print before they're posted online, and get your hands on loads of additional content by subscribing online.
In September 2010 the Yemen ministry of industry announced a national strategy to combat food shortages. The UN Food Price Index had reached consecutive record highs in the previous few months. Yemen had also suffered flooding, which had killed around 100 people and disrupted farming. The strategy, which included a review of existing subsidies and the development of food-for-work programmes, proved ineffective. By December 2010, concerned that protests over rising food prices were starting to grow in Tunisia, Yemen's president, Ali Abdullah Saleh, halved income tax and ordered the government to control the prices of basic commodities.
By late January, though, thousands of protesters had taken to the streets to demand Saleh's resignation, brandishing flatbread with baked-in slogans and wearing the food as helmets. The clashes continued into February and grew more violent as the UN's Food Price Index reached an all-time high. On March 18, 45 protesters were killed when an unidentified gunman opened fire. Six days later, the government fought a battle with al-Qaeda gunmen in the province of Abyan and Marib, killing 15. The same day, 10,000 protesters gathered in the capital city Sana'a. Saleh said that he would accept the opposition's transition plan that day, but clung on for another month. He finally quit Yemen for Saudi Arabia after a bomb planted in the presidential compound exploded, killing seven people; Saleh suffered 40 per cent burns, shrapnel wounds and internal bleeding. In total, Human Rights Watch estimates that 233 protesters were killed on the streets. Three months later, Saleh unexpectedly flew back to Yemen; 100 more protesters and tribesmen were killed in the first five days of his return and the situation remains unresolved.
A year earlier, on January 12, 2010, a tech startup posted an article on its blog: "Yemen heading for disaster in 2010?" The author, "Ninja Shoes", wrote: "Based on the information we've gathered, Yemen will likely experience food shortages and torrential floods in 2010. This combination of natural disasters, propensity for famine and malnutrition, and challenges with Islamic radicals and terrorists, make it a hot spot for conflict in the future."
The 20 employees of Recorded Future aren't foreign-policy experts. They aren't traders either, but if you'd started using Recorded Future's predictions to buy US stocks on January 1, 2009, you would have made an annual return of 56.69 per cent. (The S&P 500 had an annualised return of 17.22 per cent over the same period.) Between May 13 and August 5 this year, as markets behaved with vertiginous abandon, their strategy returned 10.4 per cent; in contrast, the S&P 500 lost 9.9 per cent of its value. They're data experts: computer scientists, statisticians and experts in linguistics. And in the data, they think, lies the future.
All Recorded Future's predictions, whatever the field, are based on publicly available information -- news articles, government sites, financial reports, tweets -- fed into the company's own algorithms. The result, it claims, is a "new tool that allows you to visualise the future" -- one that is changing how government intelligence agencies gather information and how giant hedge funds place bets. On its website, Recorded Future states: "We don't grant interviews and we don't issue press releases." But behind closed doors, the company is developing the technology that has been described be one tech blog as an "information weapon".
The company, cofounded by Christopher Ahlberg, an entrepreneur who sold his first business for $195 million and served in the Swedish special forces, has $8.5 million in funding. Its first two investors were Google and the CIA. Recorded Future counts US government agencies, banks and hedge funds among the clients paying million-dollar contracts. But its true ambition is to organise all the data on the internet for similar predictive analysis -- to make the future calculable.
Recorded Future's main office is in Gothenburg, Sweden. On a drab morning in May, trams clang past a metal door that doesn't bear the company's name. Two flights of stairs lead to a wooden door, with a discreet sticker label-gunned above the letterbox in caps: "RECORDED FUTURE". The rooms date from the 17th century; they're airy and bright with high ceilings and intricate plaster mouldings. Eight employees work here on the technical aspects of the system. The company also has offices in Boston, New York and Arlington, Virginia -- ten minutes' drive from the Pentagon, 15 from Langley.
"Yemen took four or five months longer than we predicted," says Ahlberg, 43, sitting on a sofa in a small meeting room. Before Wired visited, he warned over the telephone: "You won't get a government agency out of my mouth. Dude, if I do that, they're coming to take my kids." In person, he's tall, with hair cropped short, and is quick to laugh. The telephone caveat still stands, but Ahlberg is willing to talk for the first time about what exactly it is his company does and why Google, intelligence agencies and hedge funds are all so interested.
Ahlberg was born in September 1968, in Kungälv, a town 40 minutes' drive north of Gothenburg. His father was a captain on merchant ships, his mother taught French and English in Sweden. In his first year at secondary school, he created a drawing program on his Sinclair Spectrum called Art CAD ("like an early version of Photoshop") and sold individual copies by advertising it in the local paper. After school, he wanted to study computer science but first had to complete military service in 1987. He chose the Lapplands Jägarregemente special forces, and began training for a hypothetical Russian invasion: "They would come in from Finland and go to Norway; we were supposed to cut them off in the middle. We were supposed to do what the Iraqis are doing now, guerrilla warfare. But we were master cross-country skiers."
Ahlberg then went onto take his degree at Chalmers University of Technology in Gothenburg. As a post doc, he travelled to the University of Maryland to work as a visiting researcher at the Human-Computer Interaction Lab for two summers. During the first visit, at 23, he co-authored a published paper with the director of the lab; the second summer, he co-wrote two, about the new field of data visualisation. Ahlberg returned to Sweden, finishing his PhD in four years instead of six, but it was his work at Maryland that formed the basis for his first company, Spotfire. Launched in 1996, the business created visualisation tools for business intelligence; in 2007, it was bought by Tibco for $195 million(£125m). "I didn't have to work anymore," says Ahlberg. "But I can't stop."
Spotfire had helped businesses visualise internal databases. After the sale, "we started hanging around in coffee shops in Boston and New York", says Ahlberg. "We thought: what's the most interesting data source out there? And it's nebulous, but the web is the most interesting dataset there is on the planet. Instead of just corporate databases, let's think about the web as my data source." Ahlberg began talking with Staffan Truvé, who had supervised his PhD and started Spotfire with him. "It was in the back of my head that, as humans, we had generally started to become better at predicting things," says 48-year-old Truvé. "Your car tells you that you need to change your oil in 200km, or there is a sign saying your bus is coming in five minutes. These tiny predictive signals are popping up everywhere." Ahlberg was excited: "So then the premise becomes that the web has predictive power. How can we harvest that?"
A few isolated, eye-catching examples have shown the prognostic possibilities in such data. In 2008, Google showed search queries could accurately predict the spread of flu in the US up to two weeks before the federal Centers for Disease Control. In his book, Super Crunchers, Ian Ayers claimed that creditcard companies can predict with 98 per cent accuracy whether you'll divorce, based on your purchases -- Google's Marissa Mayer even quoted the statistic at SXSW 2011 (however, in a recent statement, Visa denied that it monitored such data or made any such conclusions, saying the claim was "inaccurate and wrong"). One recent study has shown Twitter to be 88.67 per cent accurate in predicting the Dow Jones three days in advance. This July, financier Paul Hawtin founded a London hedge fund that is based entirely on social media. And in September, a researcher from the University of Illinois fed the Nautilus supercomputer with 100 million news items, much like Recorded Future does, and "anticipated" the Arab Spring and the killing of Osama Bin Laden, albeit retrospectively -- a prediction of the past. But for Ahlberg to develop a tool that could create predictions for any input, from finance to terrorism, would be much harder. Recorded Future would not only have to index the internet, but also understand and interpret it.
The first generation of search engines, such as Lycos and Alta Vista, used traditional text search to deliver web pages, deploying their own algorithms, but essentially looking at individual documents in isolation. Google changed this in 1998. Its PageRank algorithm analysed the links between web pages, promoting those that had more links pointing to them from other sites. Recorded Future is part of the third generation: instead of explicit link analysis, it examines implicit links -- what it calls "invisible links" between documents that refer to the same entities or events. It does this by separating the documents and their content from what they talk about, identifying canonical entities and events that exist outside of the article.
"What matters is that it's freaking complicated," says Ahlberg. In practice, Recorded Future harvests 25,000 data sources as RSS feeds, which could include Companies House and US Securities and Exchange Commission filings, a New York Times article, Twitter and Facebook posts, obscure blogs (there's one on Norwegian salmon fishing) or transcripts from earnings calls or political speeches -- "just a flood of stuff", says Ahlberg. It does the same for Chinese and Arabic sources. "Then we look for entities -- people, places, technologies; and events -- a murder, a bomb explosion, a person moving from A to B, product launches."
This linguistic analysis is "really tough", according to Truvé. Because Recorded Future takes sources from all over the internet, rather than a particular data set, "the data is not so nice". "We could have built a perfect data set around Pfizer, say, or Barack Obama," says Ahlberg. "It's harder then to think of the big picture. So we tried to make this ambitious." Recorded Future currently uses two separate algorithms, one proprietary, one licensed, to analyse language; the staffers in Gothenburg are tweaking them continually to see which works better. But the result is that Recorded Future knows who Nicolas Sarkozy is, say: that he's the president of France, he's the husband of Carla Bruni, he's 1.65m tall in his socks, he travelled to Deauville for the G8 summit in May. If you Google "president of France", you'll get two Wikipedia pages on "president of France" then " Nicolas Sarkozy". Useful, but Google doesn't know how the two, Sarkozy and the presidency, are actually related; it's just searching for pages linking to the terms.
Recorded Future ranks all these canonical entities and events, based on the number of references to them, the credibility of the document or document source and several other factors, such as the co-occurrence of different events and entities in the same or in related documents, to create a "momentum" score. Positive or negative sentiment is added to this score. For example, searching big pharma in general will tell you that over the next five years, nine of the world's 15 best-selling medicines will lose patent protection -- the event earns a high momentum score because it is backed by 13 news items from 12 sources -- or that, specifically, Inhibitex, a biopharma business, will need cash in November 2011 if it plans to fund the Phase 2b development of a new drug internally, based on five items from five sources.
Recorded Future isn't the only company attempting to bring hardcore linguistic analysis to a larger audience. Wolfram Alpha is a search engine that can understand a query such as "nuclear explosions in China" and deliver relevant information such as maps and kilotonnes per explosion, although it's culled from "tame" data curated by the company itself. And IBM didn't develop Watson just to school humans on Jeopardy; it's actually a huge research project dedicated to processing questions asked in natural language, based on four terabytes of structured and unstructured data sets, including the full text of Wikipedia. "There are any number of offerings coming on to the market now," says Colin Shearer, senior vice president at SPSS, a predictive-analytics company owned by IBM. One of those is Quid, a two-year-old, 45-strong business founded in 2008. "Human activity has never left an information trail like it does today," says Bob Goodson, its founder. "If only we could harness the intelligence that's locked in the information, we could build systems to understand the world better, and therefore make better decisions." Quid includes Microsoft among its customers. With $15 million in investment, it aims to be the next Bloomberg in business intelligence.
Where Recorded Future goes beyond mere analysis of open data, though, is by adding the "time and space" dimension of the documents -- "references to when and where an event has taken place, or when and where it will take place," says Truvé, "since many documents actually refer to events expected to take place in the future." Using RSS streams allows Recorded Future to have a publishing time as an anchor point for this temporal analysis, which means it can deal with difficult expressions such as "next week", "in three months' time" or "in two quarters". This may sound simple, but it's crucial: the time and space analysis is the first way Recorded Future can make predictions about the future -- by aggregating weighted opinions about the likely timing of future events using algorithmic crowdsourcing. On top of that, it uses statistical models to predict future happenings based on historical records of similar chains of events. "The secret sauce is not dependent upon one ingredient," says Truvé. "It's a combination."
On April 1, 2009, a few months after Ahlberg and Truvé began testing this combination, Ahlberg met Rich Miner, the co-creator of the Android mobile operating system (with Andy Rubin) and a partner at Google Ventures, at the Starbucks on Harvard Square, Massachusetts. Miner was impressed: "We believed there was predictive power in the information contained in the web," he says. "If you can organise that information temporally, then you can look at past and present, and infer things from the future. That's pretty unique so far from Recorded Future." The CIA thought so, too.
In the 40s the allies routinely bombed rail bridges to disrupt supply lines into Nazi-occupied France. After a raid, though, the Royal Air Force couldn't fly reconnaissance missions over the targets as they were considered too risky, so it didn't know if a bridge had been destroyed. The Special Operations Executive (SOE), however, came up with a novel strategy for finding out. By monitoring the daily prices of oranges on sale at various fruit stalls Paris, SOE agents dropped behind enemy lines were able to tell which supply chains had been affected. (Germans embedded in London were doing the same thing; unfortunately for the Nazis, they were under the control of SOE and were fed false information.) This is the differ- ence between information and intelligence: information is the price of oranges, intelli- gence is knowing which supply chain has been affected. This openly available, "free" infor- mation, when it's turned into intelligence, becomes extremely valuable.
"Open-source intelligence has always been crucial, but for most of the cold war it was neglected by western intelligence agencies," says Calder Walton, a research associate at Cambridge University and author of the book Empire of Secrets, to be published in 2013. "That was the archetypal intelligence war: intelligence necessarily involved information that couldn't be gained from any other source -- human agents or telephone tapping." That doesn't mean covert intelligence was more effective, though: Daniel Moynihan, a former US senator, compared CIA reports gathered from secret sources with Soviet documents recovered after the fall of the Berlin Wall and found they significantly overestimated Soviet capabilities. But he discovered that western think tanks using publicly available material, such as the RAND Corporation, were much more accurate. US diplomat George Kennan estimated in 1997 that "95 per cent of what we need to know about foreign countries could very well be obtained by the careful and competent study of perfectly legitimate sources of information open and available to us".
"All of this has changed since the collapse of the Soviet Union," says Walton. "Open-source intelligence has boomed in recent years -- especially since 9/11." At a conference in 2008, Michael Hayden, then director of the CIA, said: "Open-source intelligence contributes to national security in unique and valuable ways virtually every day." Stephen Mercado, an ana- lyst in the CIA directorate of science and technology, estimates that 80 per cent of all valuable intelligence now comes from open sources. In January 2011, Sir Gus O'Donnell, head of the UKcivil service, told the Chilcot inquiry into the invasion of Iraq: "I have strongly and always been of the view that we probably underestimated open source [intelligence]." Open source is the big growth area in intelligence and every western agency is looking for the tools to give it an edge.
Ahlberg refuses to discuss his company's work with the CIA, or even whether there is work with the CIA. In-Q-Tel (IQT) is the CIA's investment arm (mission statement: "Identifies, adapts and delivers innovative technological solutions to support the missions of the Central Intelligence Agency"). It invests only in startup companies that will "provide strong, near-term advantages (within 36 months) to the IC [intelligence community]." IQT doesn't invest without the US secret intelligence services in mind. It backed Recorded Future with slightly less than $2.5 million.
Stephen Davidson, an investor at IQT who sits on Recorded Future's board, refused to comment; a spokesperson for IQT said that "while we are pleased to have Recorded Future as part of the IQT portfolio, we will respectfully decline to provide additional information about our investment". Does Ahlberg know what intelligence purposes Recorded Future is put to? "We would not know about those things," he says, folding his arms. "At this stage, I don't even want to know what people are doing with some of these things." He points out that IQT is "an independent company; at least to my knowledge theycan't force any [government agency] to use it." Truvé, though, says Recorded Future is working with 17 or 18 intelligence agencies. Another board member, Roger Ehrenberg, used to run a $6 billion hedge fund for Deutsche Bank before setting up his own firm, IA Ventures. According to Ehrenberg, In-Q-Tel is "actively involved" with Recorded Future. "Fundamentally, they look to invest in companies where they know they have a customer within the government," he says. "It's not just the CIA." Chris Holden, who works in Recorded Future's Arlington office, admitted to wired (with some understatement) that "we have a little bit of work with the federal government". Holden says that Recorded Future is being used to identify technologies the US government may invest in, such as nanotechnology in body armour. "It's not all super secret stuff necessarily." So, does having IQT as an investor mean thatRecorded Future is beholden to the US government, even if it is a private company? "We are an independent company," repeats Ahlberg. "Neither the US government, nor Google, nor hedge funds nor banks have ever tried to make us do anything. And frankly, you're sitting here with a bunch of Swedes. There's no way in hell you could get them to do anything bad."
Still, it's possible to identify examples of how one might use Recorded Future for open-source intelligence. Take the al-Qaeda leadership after Bin Laden's death: who would fill the vacuum? Recorded Future ran a search. Ayman al-Zawahiri, a founding member of Egypt's Islamic Jihad militant group, and long considered by the US government to be Bin Laden's right-hand man, showed some significant spikes in recorded and discussed activity in the last 12 months,especially when he called for military backing of Libyan rebels, suggesting al-Qaeda could fill a power vacuum in that country. But al-Zawahiri's sentiment score was extremely negative, to the degree that conspiracy theories were emerging that he was responsible for disclosing Bin Laden's location to the US. Saif al-Adel, a senior al-Qaeda commander, was attracting attention back in October 2010, written about as "the new face of al-Qaeda in 2011". Recorded Future concluded that it was "clear that Said al-Adel has been routed in Pakistan for some time now and appears to be embedded in the political structure of al-Qaeda"; his momentum score was high. They also found that Libyan Abu Yahya al-Libi (described by a former CIA analyst as an "insurgent-theologian"), offered access to one of the most volatile regions on the globe right now, based on his current likely location, which al-Qaeda might consider a useful foothold. Finally, they looked at Anwar al-Awlaki, a Yemeni-American imam who posted pro-al-Qaeda/anti-western YouTube videos, and ran a blog and Facebook page. Clearly of interest given the attempted US drone strike to kill him days after Bin Laden's death, he started building momentum in late March and April with reports that he was urging on the Arab Spring protests. Al- Awlaki was killed in Yemen dur- ing a US drone attack on September 30 this year.
Recorded Future concluded that multiple players will rise to prominence regionally, that al-Qaeda could split around al-Zawahiri, and that al-Qaeda sees advantage to be taken in the Arab pro-democracy protests. A couple of months later, al-Zawahiri was confirmed, although experts were sceptical about whether he could unite the membership in Saudi Arabia and the Gulf States behind him.
Ahlberg is more willing to talk about how Recorded Future is being used in finance. "If you take our momentum score, and look across S&P 500 companies, can you predict the liquidity or the stock volume of those companies over time?" asks Ahlberg. "It turns out you can." Stock that is being talked about and is in investors' attention is, of course, more likely to be traded: "It's much easier to prove volume than direction, whether a stock is going up or down." So Recorded Future takes momentum and combines it with sentiment -- whether a company is mentioned in a positive light -- and derives a score. Taking these news bursts across the S&P 500, it can sort them into ten different groups, from high to low. "Then you say, every day, 'I am going to own what is in the top and short what is in the bottom.' You are making lots of small picks on a daily basis - this strategy turns over the portfolio 63 per cent every day."
Running predictive tests on data from January 2009 to January 2011, Recorded Future showed that its top decile has a beta (a measure of risk in portfolio) of 1.08 -- fairly low -- and a statistically significant annualised continuous alpha (a risk-adjusted measure of active return on investment) of +16 per cent. The bottom two deciles had a high beta (1.37 and 1.34, respectively) but with statistically significant negative alphas, at -42 per cent and -26 per cent annually. "Constructing hedged portfolios out of the securities in these deciles provides some compelling trading strategies," says Evan Sparks, an analyst at Recorded Future.
Beyond high-frequency trading strategies, the company says it can predict stock shifts on the basis of one-day events, separated into scheduled events and speculative events. "The theory that if something is written saying, 'on Friday so and so will release earnings', that should be priced into the market immediately," says Ahlberg. "In reality it is not." Recorded Future took 19,000 such events and asked what happens to the stock price. On average, as stocks come into those scheduled events, the prices rise; coming out of them they fall five base points either way. "It's like finding a roulette wheel that is skewed."
Another way is to examine the next two weeks of a particular business's future and look for certain events. One is insiders selling stock. You may think this would be a good time to sell; in fact, insiders often sell just after stocks have already peaked. So Recorded Future looks for data that can be combined with this knowledge. If an insider sells stock after a management lay-off, stock falls on average 1.5 per cent. Expand this event to a whole market and "You have 2,000 events within 2011," says Ahlberg. "By turning it into a big data screen, I have created my own skewed roulette wheel I can consistently bet on." Chris Malloy is an associate professor at Harvard Business School who specialises in behavioural finance. He's played with Recorded Future's data: "I haven't seen anything with that ability. It's pretty neat -- no one's doing that. The predictability is certainly good."
What Recorded Future can't forecast are "black swan events", which are by definition unpredictable and undirected. "You can look at what happens afterwards, though," says Ahlberg. He takes the example of a natural disaster. "Start looking at how other countries behave. After a natural disaster, the US will travel there every time, the UK does it 50 per cent of the time, Iran will do it every single time, China never really does." China did, though, after the 2010 Chilean earthquake. Two months later, it announced a new trade agreement. China didn't travel to Haiti: no trade agreement followed. But it did after Pakistan was hit by flood, soon announcing a $10 billion deal. "We're looking for those historical patterns and using them to predict what might happen," says Ahlberg.
Recorded Future's hedge-fund clients are only slightly less secretive than theCIA. Ehrenberg says a handful of Wall Street hedge funds and banks are using the technology: "Recorded Future is a high-value signal, relative to conventional quantitative-analysis trading signals. People are making money." Josh Holden, CEO of Fina Technologies, which creates algorithms for high-frequency quant trading by hedge funds, says that Recorded Future's client base "is closely guarded. But there are more than a few firms using it.
Sandfire AG is a Swiss consultancy in the public and private security sectors, and a client of Recorded Future. "It helps us keep track of travel routes of high-level decision-makers," says Felix Juhl, a senior partner. "A state visit by a high-ranking politician may be followed by specific corporate activities. Keeping track of travel routes can serve as an early warning."
Ahlberg says Recorded Future now earns revenues in the millions of dollars from a client base of less than 100, but which includes governments, hedge funds, big banks, watchdogs and consultancies. This select clientele place a high value on the distilled insight the company provides. Ahlberg sees a big opportunity: "Even within what we have started around finance and intelligence, there is no reason why we couldn't build another $100 million-revenue company within a small set of years." But Recorded Future plans on being more than just a profitable business tool. Ahlberg is expanding its indexes: he eventually wants every piece of data on the planet streaming live through his company's algorithms. The ultimate goal? "We want to organise the world -- and the internet -- for analysis." What Ahlberg doesn't say, perhaps deliberately, is that ever more data will likely lead to ever more accurate predictions. "It's dangerous to start talking about predicting the future," he says. "We're trying to play that down."
Tom Cheshire is assistant editor at wired. He wrote about the Ariane 5 rocket in 10.11