This article was taken from the December 2011 issue of Wired
magazine. Be the first to read Wired's articles in print before
they're posted online, and get your hands on loads of additional
content by subscribing
online.
In September 2010 the Yemen ministry of industry announced a
national strategy to combat food shortages. The UN Food Price Index
had reached consecutive record highs in the previous few months.
Yemen had also suffered flooding, which had killed around 100
people and disrupted farming. The strategy, which included a review
of existing subsidies and the development of food-for-work
programmes, proved ineffective. By December 2010, concerned that
protests over rising food prices were starting to grow in Tunisia,
Yemen's president, Ali Abdullah Saleh, halved income tax and
ordered the government to control the prices of basic
commodities.
By late January, though,
thousands of protesters had taken to the streets to demand
Saleh's resignation, brandishing flatbread with baked-in slogans
and wearing the food as helmets. The clashes continued into
February and grew more violent as the UN's Food Price Index reached
an all-time high. On March 18, 45 protesters were killed when an
unidentified gunman opened fire. Six days later, the government
fought a battle with al-Qaeda gunmen in the province of Abyan and
Marib, killing 15. The same day, 10,000 protesters gathered in the
capital city Sana'a. Saleh said that he would accept the
opposition's transition plan that day, but clung on for another
month. He finally quit Yemen for Saudi Arabia after a bomb planted
in the presidential compound exploded, killing seven people; Saleh
suffered 40 per cent burns, shrapnel wounds and internal bleeding.
In total, Human Rights Watch estimates that 233 protesters were
killed on the streets. Three months later, Saleh unexpectedly flew
back to Yemen; 100 more protesters and tribesmen were killed in the
first five days of his return and the situation remains
unresolved.
A year earlier, on January 12, 2010, a tech startup posted an
article on its blog: "Yemen heading for disaster in 2010?" The
author, "Ninja Shoes", wrote: "Based on the information we've
gathered, Yemen will likely experience food shortages and
torrential floods in 2010. This combination of natural disasters,
propensity for famine and malnutrition, and challenges with Islamic
radicals and terrorists, make it a hot spot for conflict in the
future."
The 20 employees of Recorded Future aren't foreign-policy
experts. They aren't traders either, but if you'd started using
Recorded Future's predictions to buy US stocks on January 1, 2009,
you would have made an annual return of 56.69 per cent. (The
S&P 500 had an annualised return of 17.22 per cent over the
same period.) Between May 13 and August 5 this year, as markets
behaved with vertiginous abandon, their strategy returned 10.4 per
cent; in contrast, the S&P 500 lost 9.9 per cent of its value.
They're data experts: computer scientists, statisticians and
experts in linguistics. And in the data, they think, lies the
future.
All Recorded Future's predictions, whatever the field, are based
on publicly available information -- news articles, government
sites, financial reports, tweets -- fed into the company's own
algorithms. The result, it claims, is a "new tool that allows you
to visualise the future" -- one that is changing how government
intelligence agencies gather information and how giant hedge funds
place bets. On its website, Recorded Future states: "We don't grant
interviews and we don't issue press releases." But behind closed
doors, the company is developing the technology that has been
described be one tech blog as an "information weapon".
The company, cofounded by Christopher Ahlberg, an entrepreneur
who sold his first business for $195 million and served in the
Swedish special forces, has $8.5 million in funding. Its first two
investors were Google and the
CIA. Recorded Future counts US government agencies, banks and
hedge funds among the clients paying million-dollar contracts. But
its true ambition is to organise all the data on the internet for
similar predictive analysis -- to make the future calculable.
Recorded Future's main office is in Gothenburg,
Sweden. On a drab morning in May, trams clang past a metal door
that doesn't bear the company's name. Two flights of stairs lead to
a wooden door, with a discreet sticker label-gunned above the
letterbox in caps: "RECORDED FUTURE". The rooms date from the 17th
century; they're airy and bright with high ceilings and intricate
plaster mouldings. Eight employees work here on the technical
aspects of the system. The company also has offices in Boston, New
York and Arlington, Virginia -- ten minutes' drive from the
Pentagon, 15 from Langley.
"Yemen took four or five months longer than we predicted," says
Ahlberg, 43, sitting on a sofa in a small meeting room. Before
Wired visited, he warned over the telephone: "You won't get a
government agency out of my mouth. Dude, if I do that, they're
coming to take my kids." In person, he's tall, with hair cropped
short, and is quick to laugh. The telephone caveat still stands,
but Ahlberg is willing to talk for the first time about what
exactly it is his company does and why Google, intelligence
agencies and hedge funds are all so interested.
Ahlberg was born in September 1968, in Kungälv, a town 40
minutes' drive north of Gothenburg. His father was a captain on
merchant ships, his mother taught French and English in Sweden. In
his first year at secondary school, he created a drawing program on
his Sinclair Spectrum called Art CAD ("like an early version of
Photoshop") and sold individual copies by advertising it in the
local paper. After school, he wanted to study computer science but
first had to complete military service in 1987. He chose the
Lapplands Jägarregemente special forces, and began training for a
hypothetical Russian invasion: "They would come in from Finland and
go to Norway; we were supposed to cut them off in the middle. We
were supposed to do what the Iraqis are doing now, guerrilla
warfare. But we were master cross-country skiers."
Ahlberg then went onto take his degree at Chalmers University of
Technology in Gothenburg. As a post doc, he travelled to the
University of Maryland to work as a visiting researcher at the
Human-Computer Interaction Lab for two summers. During the first
visit, at 23, he co-authored a published paper with the director of
the lab; the second summer, he co-wrote two, about the new field of
data visualisation. Ahlberg returned to Sweden, finishing his PhD
in four years instead of six, but it was his work at Maryland that
formed the basis for his first company, Spotfire. Launched in 1996,
the business created visualisation tools for business intelligence;
in 2007, it was bought by Tibco for $195 million(£125m). "I didn't
have to work anymore," says Ahlberg. "But I can't stop."
Spotfire had helped businesses visualise internal databases.
After the sale, "we started hanging around in coffee shops in
Boston and New York", says Ahlberg. "We thought: what's the most
interesting data source out there? And it's nebulous, but the web
is the most interesting dataset there is on the planet. Instead of
just corporate databases, let's think about the web as my data
source." Ahlberg began talking with Staffan Truvé, who had
supervised his PhD and started Spotfire with him. "It was in the
back of my head that, as humans, we had generally started to become
better at predicting things," says 48-year-old Truvé. "Your car
tells you that you need to change your oil in 200km, or there is a
sign saying your bus is coming in five minutes. These tiny
predictive signals are popping up everywhere." Ahlberg was excited:
"So then the premise becomes that the web has predictive power. How
can we harvest that?"
A few isolated, eye-catching examples have shown the prognostic
possibilities in such data. In 2008, Google showed search queries
could accurately predict the spread of flu in the US up to two
weeks before the federal Centers for Disease Control. In his book,
Super Crunchers, Ian Ayers claimed that creditcard
companies can predict with 98 per cent accuracy whether you'll
divorce, based on your purchases -- Google's Marissa Mayer even
quoted the statistic at
SXSW 2011 (however, in a recent statement, Visa denied that it
monitored such data or made any such conclusions, saying the claim
was "inaccurate and wrong"). One recent study has shown Twitter to
be 88.67 per cent accurate in predicting the Dow Jones three days
in advance. This July, financier Paul Hawtin founded a London hedge
fund that is based entirely on social media. And in September, a
researcher from the University of Illinois fed the Nautilus
supercomputer with 100 million news items, much like Recorded
Future does, and "anticipated" the Arab Spring and the killing of
Osama Bin Laden, albeit retrospectively -- a prediction of the
past. But for Ahlberg to develop a tool that could create
predictions for any input, from finance to terrorism, would be much
harder. Recorded Future would not only have to index the internet,
but also understand and interpret it.
The first generation of search engines, such as Lycos and Alta
Vista, used traditional text search to deliver web pages, deploying
their own algorithms, but essentially looking at individual
documents in isolation. Google changed this in 1998. Its PageRank
algorithm analysed the links between web pages, promoting those
that had more links pointing to them from other sites. Recorded
Future is part of the third generation: instead of explicit link
analysis, it examines implicit links -- what it calls "invisible
links" between documents that refer to the same entities or events.
It does this by separating the documents and their content from
what they talk about, identifying canonical entities and events
that exist outside of the article.
"What matters is that it's freaking complicated," says Ahlberg.
In practice, Recorded Future harvests 25,000 data sources as RSS
feeds, which could include Companies House and US Securities and
Exchange Commission filings, a New York Times article, Twitter and Facebook posts, obscure
blogs (there's one on Norwegian salmon fishing) or transcripts from
earnings calls or political speeches -- "just a flood of stuff",
says Ahlberg. It does the same for Chinese and Arabic sources.
"Then we look for entities -- people, places, technologies; and
events -- a murder, a bomb explosion, a person moving from A to B,
product launches."
This linguistic analysis is "really tough", according to Truvé.
Because Recorded Future takes sources from all over the internet,
rather than a particular data set, "the data is not so nice". "We
could have built a perfect data set around Pfizer, say, or Barack
Obama," says Ahlberg. "It's harder then to think of the big
picture. So we tried to make this ambitious." Recorded Future
currently uses two separate algorithms, one proprietary, one
licensed, to analyse language; the staffers in Gothenburg are
tweaking them continually to see which works better. But the result
is that Recorded Future knows who Nicolas Sarkozy is, say: that
he's the president of France, he's the husband of Carla Bruni, he's
1.65m tall in his socks, he travelled to Deauville for the G8
summit in May. If you Google "president of France", you'll get two
Wikipedia pages on "president of France" then "
Nicolas Sarkozy". Useful, but Google doesn't know how the two,
Sarkozy and the presidency, are actually related; it's just
searching for pages linking to the terms.
Recorded Future ranks all these canonical entities and events,
based on the number of references to them, the credibility of the
document or document source and several other factors, such as the
co-occurrence of different events and entities in the same or in
related documents, to create a "momentum" score. Positive or
negative sentiment is added to this score. For example, searching
big pharma in general will tell you that over the next five years,
nine of the world's 15 best-selling medicines will lose patent
protection -- the event earns a high momentum score because it is
backed by 13 news items from 12 sources -- or that, specifically,
Inhibitex, a biopharma business, will need cash in November 2011 if
it plans to fund the Phase 2b development of a new drug internally,
based on five items from five sources.
Recorded Future isn't the only company attempting to bring
hardcore linguistic analysis to a larger audience. Wolfram Alpha is
a search engine that can understand a query such as "nuclear
explosions in China" and deliver relevant information such as maps
and kilotonnes per explosion, although it's culled from "tame" data
curated by the company itself. And IBM didn't develop Watson just
to school humans on Jeopardy; it's actually a huge research project
dedicated to processing questions asked in natural language, based
on four terabytes of structured and unstructured data sets,
including the full text of Wikipedia. "There are any number of
offerings coming on to the market now," says Colin Shearer, senior
vice president at SPSS, a predictive-analytics company owned by
IBM. One of those is Quid, a two-year-old, 45-strong business
founded in 2008. "Human activity has never left an information
trail like it does today," says Bob Goodson, its founder. "If only
we could harness the intelligence that's locked in the information,
we could build systems to understand the world better, and
therefore make better decisions." Quid includes Microsoft among its
customers. With $15 million in investment, it aims to be the next
Bloomberg in business intelligence.
Where Recorded Future goes beyond mere analysis of open data,
though, is by adding the "time and space" dimension of the
documents -- "references to when and where an event has taken
place, or when and where it will take place," says Truvé, "since
many documents actually refer to events expected to take place in
the future." Using RSS streams allows Recorded Future to have a
publishing time as an anchor point for this temporal analysis,
which means it can deal with difficult expressions such as "next
week", "in three months' time" or "in two quarters". This may sound
simple, but it's crucial: the time and space analysis is the first
way Recorded Future can make predictions about the future -- by
aggregating weighted opinions about the likely timing of future
events using algorithmic crowdsourcing. On top of that, it uses
statistical models to predict future happenings based on historical
records of similar chains of events. "The secret sauce is not
dependent upon one ingredient," says Truvé. "It's a
combination."
On April 1, 2009, a few months after Ahlberg and Truvé began
testing this combination, Ahlberg met Rich Miner, the co-creator of
the Android mobile operating system (with Andy Rubin) and a partner
at Google Ventures, at the Starbucks on Harvard Square,
Massachusetts. Miner was impressed: "We believed there was
predictive power in the information contained in the web," he says.
"If you can organise that information temporally, then you can look
at past and present, and infer things from the future. That's
pretty unique so far from Recorded Future." The CIA thought so,
too.
In the 40s the allies routinely bombed rail bridges to disrupt
supply lines into Nazi-occupied France. After a raid, though, the
Royal Air Force couldn't fly reconnaissance missions over the
targets as they were considered too risky, so it didn't know if a
bridge had been destroyed. The Special Operations Executive (SOE),
however, came up with a novel strategy for finding out. By
monitoring the daily prices of oranges on sale at various fruit
stalls Paris, SOE agents dropped behind enemy lines were able to
tell which supply chains had been affected. (Germans embedded in
London were doing the same thing; unfortunately for the Nazis, they
were under the control of SOE and were fed false information.) This
is the differ- ence between information and intelligence:
information is the price of oranges, intelli- gence is knowing
which supply chain has been affected. This openly available, "free"
infor- mation, when it's turned into intelligence, becomes
extremely valuable.
"Open-source intelligence has always been crucial, but for most
of the
cold war it was neglected by western intelligence agencies,"
says Calder Walton, a research associate at Cambridge University
and author of the book Empire of Secrets, to be published in 2013.
"That was the archetypal intelligence war: intelligence necessarily
involved information that couldn't be gained from any other source
-- human agents or telephone tapping." That doesn't mean covert
intelligence was more effective, though: Daniel Moynihan, a former
US senator, compared CIA reports gathered from secret sources with
Soviet documents recovered after the fall of the Berlin Wall and
found they significantly overestimated Soviet capabilities. But he
discovered that western think tanks using publicly available
material, such as the RAND Corporation, were much more accurate. US
diplomat George Kennan estimated in 1997 that "95 per cent of what
we need to know about foreign countries could very well be obtained
by the careful and competent study of perfectly legitimate sources
of information open and available to us".
"All of this has changed since the collapse of the Soviet
Union," says Walton. "Open-source intelligence has boomed in recent
years -- especially since 9/11." At a conference in 2008, Michael
Hayden, then director of the CIA, said: "Open-source intelligence
contributes to national security in unique and valuable ways
virtually every day." Stephen Mercado, an ana- lyst in the CIA
directorate of science and technology, estimates that 80 per cent
of all valuable intelligence now comes from open sources. In
January 2011, Sir Gus O'Donnell, head of the UKcivil service, told
the Chilcot inquiry into the invasion of Iraq: "I have strongly and
always been of the view that we probably underestimated open source
[intelligence]." Open source is the big growth area in intelligence
and every western agency is looking for the tools to give it an
edge.
Ahlberg refuses to discuss his company's work with the CIA, or
even whether there is work with the CIA. In-Q-Tel (IQT) is the
CIA's investment arm (mission statement: "Identifies, adapts and
delivers innovative technological solutions to support the missions
of the Central Intelligence Agency"). It invests only in startup
companies that will "provide strong, near-term advantages (within
36 months) to the IC [intelligence community]." IQT doesn't invest
without the US secret intelligence services in mind. It backed
Recorded Future with slightly less than $2.5 million.
Stephen Davidson, an investor at IQT who sits on Recorded
Future's board, refused to comment; a spokesperson for IQT said
that "while we are pleased to have Recorded Future as part of the
IQT portfolio, we will respectfully decline to provide additional
information about our investment". Does Ahlberg know what
intelligence purposes Recorded Future is put to? "We would not know
about those things," he says, folding his arms. "At this stage, I
don't even want to know what people are doing with some of these
things." He points out that IQT is "an independent company; at
least to my knowledge theycan't force any [government agency] to
use it." Truvé, though, says Recorded Future is working with 17 or
18 intelligence agencies. Another board member, Roger Ehrenberg,
used to run a $6 billion hedge fund for Deutsche Bank before
setting up his own firm, IA Ventures. According to Ehrenberg,
In-Q-Tel is "actively involved" with Recorded Future.
"Fundamentally, they look to invest in companies where they know
they have a customer within the government," he says. "It's not
just the CIA." Chris Holden, who works in Recorded Future's
Arlington office, admitted to wired (with some understatement) that
"we have a little bit of work with the federal government". Holden
says that Recorded Future is being used to identify technologies
the US government may invest in, such as nanotechnology in body
armour. "It's not all super secret stuff necessarily." So, does
having IQT as an investor mean thatRecorded Future is beholden to
the US government, even if it is a private company? "We are an
independent company," repeats Ahlberg. "Neither the US government,
nor Google, nor hedge funds nor banks have ever tried to make us do
anything. And frankly, you're sitting here with a bunch of Swedes.
There's no way in hell you could get them to do anything bad."
Still, it's possible to identify examples of how one might use
Recorded Future for open-source intelligence. Take the al-Qaeda
leadership after Bin Laden's death: who would fill the vacuum?
Recorded Future ran a search. Ayman al-Zawahiri, a founding member
of Egypt's Islamic Jihad militant group, and long considered by the
US government to be Bin Laden's right-hand man, showed some
significant spikes in recorded and discussed activity in the last
12 months,especially when he called for military backing of Libyan
rebels, suggesting al-Qaeda could fill a power vacuum in that
country. But al-Zawahiri's sentiment score was extremely negative,
to the degree that conspiracy theories were emerging that he was
responsible for disclosing Bin Laden's location to the US. Saif
al-Adel, a senior al-Qaeda commander, was attracting attention back
in October 2010, written about as "the new face of al-Qaeda in
2011". Recorded Future concluded that it was "clear that Said
al-Adel has been routed in Pakistan for some time now and appears
to be embedded in the political structure of al-Qaeda"; his
momentum score was high. They also found that Libyan Abu Yahya
al-Libi (described by a former CIA analyst as an
"insurgent-theologian"), offered access to one of the most volatile
regions on the globe right now, based on his current likely
location, which al-Qaeda might consider a useful foothold. Finally,
they looked at Anwar al-Awlaki, a Yemeni-American imam who posted
pro-al-Qaeda/anti-western YouTube videos, and ran a blog and
Facebook page. Clearly of interest given the attempted US drone
strike to kill him days after Bin Laden's death, he started
building momentum in late March and April with reports that he was
urging on the Arab Spring protests. Al- Awlaki was killed in Yemen
dur- ing a US drone attack on September 30 this year.
Recorded Future concluded that multiple players will rise to
prominence regionally, that al-Qaeda could split around
al-Zawahiri, and that al-Qaeda sees advantage to be taken in the
Arab pro-democracy protests. A couple of months later, al-Zawahiri
was confirmed, although experts were sceptical about whether he
could unite the membership in Saudi Arabia and the Gulf States
behind him.
Ahlberg is more willing to talk about how Recorded Future is
being used in finance. "If you take our momentum score, and look
across S&P 500 companies, can you predict the liquidity or the
stock volume of those companies over time?" asks Ahlberg. "It turns
out you can." Stock that is being talked about and is in investors'
attention is, of course, more likely to be traded: "It's much
easier to prove volume than direction, whether a stock is going up
or down." So Recorded Future takes momentum and combines it with
sentiment -- whether a company is mentioned in a positive light --
and derives a score. Taking these news bursts across the S&P
500, it can sort them into ten different groups, from high to low.
"Then you say, every day, 'I am going to own what is in the top and
short what is in the bottom.' You are making lots of small picks on
a daily basis - this strategy turns over the portfolio 63 per cent
every day."
Running predictive tests on data from January 2009 to January
2011, Recorded Future showed that its top decile has a beta (a
measure of risk in portfolio) of 1.08 -- fairly low -- and a
statistically significant annualised continuous alpha (a
risk-adjusted measure of active return on investment) of +16 per
cent. The bottom two deciles had a high beta (1.37 and 1.34,
respectively) but with statistically significant negative alphas,
at -42 per cent and -26 per cent annually. "Constructing hedged
portfolios out of the securities in these deciles provides some
compelling trading strategies," says Evan Sparks, an analyst at
Recorded Future.
Beyond high-frequency trading strategies, the company says it
can predict stock shifts on the basis of one-day events, separated
into scheduled events and speculative events. "The theory that if
something is written saying, 'on Friday so and so will release
earnings', that should be priced into the market immediately," says
Ahlberg. "In reality it is not." Recorded Future took 19,000 such
events and asked what happens to the stock price. On average, as
stocks come into those scheduled events, the prices rise; coming
out of them they fall five base points either way. "It's like
finding a roulette wheel that is skewed."
Another way is to examine the next two weeks of a particular
business's future and look for certain events. One is insiders
selling stock. You may think this would be a good time to sell; in
fact, insiders often sell just after stocks have already peaked. So
Recorded Future looks for data that can be combined with this
knowledge. If an insider sells stock after a management lay-off,
stock falls on average 1.5 per cent. Expand this event to a whole
market and "You have 2,000 events within 2011," says Ahlberg. "By
turning it into a big data screen, I have created my own skewed
roulette wheel I can consistently bet on." Chris Malloy is an
associate professor at Harvard Business School who specialises in
behavioural finance. He's played with Recorded Future's data: "I
haven't seen anything with that ability. It's pretty neat -- no
one's doing that. The predictability is certainly good."
What Recorded Future can't forecast are "black swan events",
which are by definition unpredictable and undirected. "You can look
at what happens afterwards, though," says Ahlberg. He takes the
example of a natural disaster. "Start looking at how other
countries behave. After a natural disaster, the US will travel
there every time, the UK does it 50 per cent of the time, Iran will
do it every single time, China never really does." China did,
though, after the 2010 Chilean earthquake. Two months later, it
announced a new trade agreement. China didn't travel to Haiti: no
trade agreement followed. But it did after Pakistan was hit by
flood, soon announcing a $10 billion deal. "We're looking for those
historical patterns and using them to predict what might happen,"
says Ahlberg.
Recorded Future's hedge-fund clients are only slightly less
secretive than theCIA. Ehrenberg says a handful of Wall Street
hedge funds and banks are using the technology: "Recorded Future is
a high-value signal, relative to conventional quantitative-analysis
trading signals. People are making money." Josh Holden, CEO of Fina
Technologies, which creates algorithms for high-frequency quant
trading by hedge funds, says that Recorded Future's client base "is
closely guarded. But there are more than a few firms using it.
Sandfire AG is a Swiss consultancy in the public and private
security sectors, and a client of Recorded Future. "It helps us
keep track of travel routes of high-level decision-makers," says
Felix Juhl, a senior partner. "A state visit by a high-ranking
politician may be followed by specific corporate activities.
Keeping track of travel routes can serve as an early warning."
Ahlberg says Recorded Future now earns revenues in the millions
of dollars from a client base of less than 100, but which includes
governments, hedge funds, big banks, watchdogs and consultancies.
This select clientele place a high value on the distilled insight
the company provides. Ahlberg sees a big opportunity: "Even within
what we have started around finance and intelligence, there is no
reason why we couldn't build another $100 million-revenue company
within a small set of years." But Recorded Future plans on being
more than just a profitable business tool. Ahlberg is expanding its
indexes: he eventually wants every piece of data on the planet
streaming live through his company's algorithms. The ultimate goal?
"We want to organise the world -- and the internet -- for
analysis." What Ahlberg doesn't say, perhaps deliberately, is that
ever more data will likely lead to ever more accurate predictions.
"It's dangerous to start talking about predicting the future," he
says. "We're trying to play that down."
Tom Cheshire is assistant editor at wired. He wrote about
the Ariane 5 rocket in 10.11