Freesteel Blog » DL
Thursday, September 29th, 2011 at 9:35 am - DL
HP has a history of buying worthless corporate software outfits that have gone well past their sell-by date. A few years ago they bought one of the UK government’s worst IT providers, EDS, and with it gained a £709million liability in the form of a compensation payment to BSkyB, because Rupert Murdoch, unlike the UK government, actually insists on getting his money back from a company that has “fraudulently misrepresented itself in a salespitch”.
This year HP bought another extremely crappy software company called Autonomy for billions of dollars, long after the world had learnt at great expense that they had no technology and were just a bunch of hot-heads.
I’ll just quote you the press release on the matter from Oracle:
Autonomy CEO Mike Lynch continues to insist that Autonomy was never ‘shopped’ to Oracle. But now at least he remembers and admits to meeting with Oracle President Mark Hurd and Doug Kehring, Oracle’s head of M&A, this past April.
But CEO Lynch insists that it was a purely technical meeting, limited to a ‘lively discussion of database technologies.’
Interesting, but not true.
The slides Lynch showed Oracle’s Mark Hurd and Doug Kehring were all about Autonomy’s financial results, Autonomy’s stock price history, Autonomy’s Price/Earnings history and Autonomy’s stock market valuation. Ably assisting Mike Lynch’s attempt to sell Autonomy to Oracle was Silicon Valley’s most famous shopper/seller of companies, the legendary investment banker Frank Quattrone. After the sales pitch was over, Oracle refused to make an offer because Autonomy’s current market value of $6 billion was way too high.
We have put Mike Lynch’s PowerPoint slide sales-pitch up on the Oracle website
with the hope Mike Lynch will recognize his slides, his memory will be restored, and he will recall what he and Frank Quattrone discussed during their visit to Oracle last April. Yesterday, the Autonomy CEO did not remember having any meeting with Oracle. Today, he remembers the April meeting and inaccurately describes how it came about and what was discussed…
One thing that makes a company valuable is that it is able to get away with delivering consistently bad value to the customers.
Here is the slide where they detail the money they make by locking-in the customer into buying all their over-priced extras. Once this has happened to one part of an unsuspecting business run by people who don’t know enough about IT, the rot then spreads with the “decision to standardize” the rest of a business into Autonomy’s incompatible products.
I am on the case because of the BBC contract with Autonomy to make Democracy Live when the CEO Mike Lynch was on the board of trustees.
For me, it is not the waste that bugs me, so much as the way this business model actively suppresses and discourages good software in its place. A loud mouth with nothing to offer but who has got the money needs to make it his business that there is nothing else out there which can prove itself equivalent or superior.
Tuesday, September 6th, 2011 at 12:04 am - DL
Well someone has got to do it, and I doubt any journalists or other BBC staff have actually gone to the trouble to investigate the quality (or lack thereof) of their bought-in Autonomy/Blinkx speech to text video searching engine which they have wired up to the Parliamentary feed and then mis-sold (for the purpose of avoiding any form of accountability) as journalism.
As you know, I have a case about this piece of work with the Information Tribunal, because I’ve got to find out to what lengths they went in order to deliberately avoid using the free, reliable, substantive, authoritative, useful and content-driven structured XML feed of parsed-from-Hansard that is in the back system of theyworkforyou — when they instead chose to commission this shoddy piece of fundamentally flawed technology that serves no purpose and viciously wasted a very real opportunity for parliamentary publication excellence.
What I did was search for the word “Liverpool” in on the site, and then exhaustively identify all the hits that they had against the video against what was recorded in Hansard. (The Hansard transcript, as you should know, is edited to improve the grammar and clarity of the spoken word.)
Here’s the page you get back of the main debate in which “Liverpool” appears:
Following this blogpost, here is my reply to the response. The preliminary hearing is set for 13 August, I believe. What a lot of work.
IN THE MATTER OF AN APPEAL TO THE FIRST-TIER TRIBUNAL
UNDER SECTION 57 OF THE FREEDOM OF INFORMATION ACT 2000
BETWEEN: JULIAN TODD (Appellant)
THE INFORMATION COMMISSIONER (Respondent)
REPLY TO THE RESPONSE OF THE INFORMATION COMMISSIONER
1. This Reply is served in accordance with rule 24 of the Tribunal Procedure (First-tier Tribunal)(General Regulatory Chamber) Rules 2009.
2. My request was for all progress reports produced by the Democracy Live team, including any that were filed by the contractors Autonomy and Blinkx in the 18 months leading up to the launch of the Democracy Live service.
3. In a conversation with the ICO I clarified my request to focus on the technical progress reports produced by the contractors, which I believed came under the terms of the Freedom of Information Act. This was a reasonable move because, according to the guidelines, if some of the requested information is exempt, the non-exempt information should still be disclosed.
4. The Democracy Live service was launched on 2 November 2009. In particular, I am referring to its repository of archived video content from at least 8 debating chambers which has been filtered through Blinkx and Autonomy speech-to-text technology in order to enable a text-based search feature.
5. The two questions I put before before the Tribunal are:
(a) Is the Democracy Live service (video archive, speech-to-text and search), considered in its own right, journalism?
(b) Is this video and searchable content held to any significant extent for the purpose of journalism within the terms of the High Court Judgement?
Is the Democracy Live service journalism?
6. According to Decision Notice FS50284450, the BBC argued that the technical progress reports which relate to the development of the site are held for the purpose of journalism, and that web content developers require private journalistic space in which to gather, analyse, weigh and editorialise information in order to determine the most effective way to provide this coverage.
7. It is not the case that all web content provided by the BBC is journalism, art or literature. For example notices of transmission times, and google-generated pages of search results of their content are not the product of journalism, art or literature.
8. According to the Democracy Live FAQ:
“We cover the main chambers of the House of Commons, House of Lords, Scottish Parliament, Northern Ireland Assembly, Welsh Assembly and full sittings of the European Parliament. We also cover Westminster Hall and Select Committees at Westminster. When there is no business in the main chambers in Edinburgh, Belfast and Cardiff, we cover committee meetings…
Our Search is one of the most innovative aspects of Democracy Live. It works by using a “speech-to-text” system. After a video is made available to watch again, our system adds words spoken in the video for you to search on. When it finds a word you’ve asked for, it gives you a link straight to the point in the video where the word is spoken. You can also search for representatives by name, place and postcode…
Our systems have to process the video after it’s gone out live and this takes approximately the same time as the length of the business. Therefore, if the item lasted 30 minutes, it will take about that time to process the video, followed by a few more minutes for it to be published on the site. After it’s been published, a second process produces the speech-to-text functions and this too takes about the same time as the duration of the video…
At launch, our archive is limited but it increases in size every day…”
In other words, in common with TheyWorkForYou.com, there is no selectivity applied to the source material. Any failure detect the word “pipsqueak” spoken by Tom Watson MP in the House of Commons on 7 July 2010 can be assumed to be purely a failure of the technical implementation, and not the consequence of any journalistic editorial selection by the BBC.
9. I have submitted my appeal to this Tribunal in the hope of clarifying the issue that there are no journalistic processes occurring between the raw source material (video feeds from the debating chambers), and the Democracy Live service which is accessible to the public.
Is the Democracy Live service held for the purpose of journalism?
10. The response by the ICO to my appeal downplayed the points provided to them by the BBC indicating that Democracy Live was equivalent to editorial content. Instead, the ICO argued that the meaning of “journalism” was wider than simply editorial selection, because it involved collecting or gathering, writing and verifying of materials for publication, and that the involvement of editorial staff on the steering group which received the technical progress reports from the contractors was relevant.
11. The Democracy Live service could undoubtedly be useful as a means of gathering materials for the purpose of journalism. However, because the service appears to be fully exposed to the public on the website — at least to an equivalent extent that BBC journalists have access to it — the journalism derogation cannot be stretched to protect it from FOI.
12. The Democracy Live service, to the public and to BBC journalists, could easily be sold off and provided as-is by a third party without any change of procedure. Disclosing information about it would not disclose anything about journalists’ private gathering, writing and verifying of materials for publication, because these materials are fully published in the form of the service. That is where the jurisdiction of the journalist ends.
13. It is true that BBC journalists and editorial staff may be heavy users of the Democracy Live service. As such it would be reasonable for the team to consult them at any stage of the project in order to ensure that it was a success. But this is equivalent to an authority consulting and following the advice of journalists as to where and when to hold a major press conference so that more of them turned up.
Disclosure of information under FOI
14. I have not been able to address any of the exemptions that are likely to be applied if the requested information comes under the Freedom of Information Act. If disclosure is considered exempt under Section 43 (Commercial interests) then it is subject to a public interest test.
15. The Democracy Live service, as advertised, clearly has a displacement effect on the activities of TheyWorkForYou.com, in that it provides a new competing website where viewers can search the speeches made by MPs in the House of Commons.
16. There are issues with the reliability of the Democracy Live technology, in that it is less capable than TheyWorkForYou.com at delivering accurate results for searches of speeches in the recent past, or anything preceding September 2009.
17. The only published information about the reliability of the Democracy Live service is included below:
“Our search is powered by a speech-to-text system built by two companies called Blinkx and Autonomy which create transcriptions of the words spoken in the video.
Generally speaking, the industry standard for accuracy in speech-to-text systems is reckoned to be about 80%. In Democracy Live tests, we’ve seen slightly higher than that. We’ve taken account of different accents across the UK but the system might still be a bit confused by some words. Have a look at the explanation of how the site works for more about search and other questions you may have.
One aspect we’re particularly proud of is that we’ve managed to deliver good results for speech-to-text in Welsh, which, we’re told, is unique.”
18. Beyond the above statement, there is no other information about the suitability or reliability of the service provided, or any recommendation to viewers who have experienced inadequate results to try TheyWorkForYou.com if they are searching UK Parliamentary debates.
19. Democracy Live does cover the European Parliament and Select Committees, which TheyWorkForYou.com does not. Given a fraction of the resources that have been lavished on Democracy Live, TheyWorkForYou.com (which is an open source project run by a charity), could have easily extended its coverage (based on webscraping and parsing technology) to these other bodies. However, this activity has been delayed and is unlikely to happen, owing to the displacement effect of the government resourced Democracy Live project.
20. Accordingly, I believe it is in the public interest for the BBC to be compelled to disclose requested information about the technology and development of this non-journalistic web-service so that viewers can be informed about its cost, nature and reliability.
DATED 14 July 2010
My appeal over a request to the BBC for information about their £1-£1.5 million – Democracy Live project, powered by a speech-to-text system built by two companies called Blinkx and Autonomy (both of which BBC non-executive board member Mike Lynch has financial interests in) got turned down by the ICO a couple of weeks ago.
This publicly funded project makes a lot of people’s blood boil, because there was already a well-stablished volunteer-run (because no moneyed-up democratic institution in its tiny mind sees fit to give it any backing so far) Parliamentary accessibility project called TheyWorkForYou.com that’s based on the actual textual official transcripts — which can be used for a whole lot more purposes than a bunch of vegetative time-consuming video streams.
Why couldn’t the BBC divert a few small crumbs from their expensive project budget our way so that we could align each other’s datasets (video and visible text) to produce a common page where the information is merged so it can be used in many more places?
The speech-to-text technology, although incredible, is not fit for consumption, except as raw material for a search engine. It detects words rather than properly edited sentences.
My ICO decision notice FS50284450 was promulgated in the negative. I had asked for the progress reports submitted by these two companies during the 18 month build phase of the project, and the BBC claimed they contained journalsim, and that disclosing this information would have a “chilling effect” upon its editorial freedom.
The only thing it would have a “chilling effect” on is the ability for its disconnected board of directors to dream up private pet projects that unnecessarily undermines rather than supports unpaid citizen action outside.
How much cheaper would it have been to have turned this idea into something everyone would have loved and been proud of?
In the interests of retro hand-written form filling, here’s a page from my paper submission:
Wish me luck.