Hungary got subsidy

Friday, May 8th, 2009 at 1:15 am

I’ve just got back from a 3 day junket in Brussels curtesy of the Hewlett Foundation [Corks! do they have that much money? I better claim for that sandwich I had on the train] organized by Jack Thurston.

As well as the usual long-haired suspects from Britain, there were some computer-assisted journalists, mainly from Denmark, who use this very expensive (they wouldn’t tell me how much) java-based UI-driven web-scraping program called kapow that can’t possibly be versatile enough to do anything much, but supposedly is. (I have to get to grips with the free version at some point to see how.)

The main deal behind this meeting was to support with data that had just been published and was linkable from 27 different websites in 27 different (often very bad) forms.

The work got shared out. Some of us got to grips with interesting webscraping through broken interfaces. And for me, one of the Danes offered the Hungarian data for 2007-10-16 to 2008-10-15 which, for no apparent reason, had been released as a single 13756 page 46Mb PDF file.

Using my technology and skills from hacking the PDF files of the Security Council and General Assembly for [note: no generous grants whatsoever here, you miserable unimaginative funds], I had a look at it and thought it was doable. Not that my Hungarian is any good. On-line translation webpages were able to deal with the handful of words that I needed.

I got stuck in. Unfortunately I didn’t manage to get it done in time for the press conference this afternoon called who wants to be a CAP Euro-millionaire?, but I finished it off on the train home to Liverpool.

My very clean data is here in comma separated form. Please contact me if you need any amendments to the format. They’re easy to do right now. The details are as follows:

Each farm is written on several lines representing multple payments from both national and EU funds, as well as an overall value (which is the sum of both).

The entries in each row are:

  • page-block – For use as a database-id which can be referred back to the original document by page number, and block number counting from 0 from the start of the page. The first (zeroth) block on a page is missing if it was a continuation of the last block on the previous page.
  • address – Left blank for repeat references to the same farm (where the page-block is the same). The first entry is always the overall sum (for verification).
  • ogcim – Meaning not known. Sometimes contains word “support” or “national”. It is blank for the overall sum case.
  • fund – This is always “EMGA (NVT)”, “EMGA”, “EMVA” or “national”. I don’t know the meanings.
  • source – This is either “overall” (the sum), “EU” or “national”.
  • amount – in (Ft). Sometimes this is negative.

The data I have produced is very clean (correcting difficulties on page 118 and page 8963) and all numbers (the “overall” figures) agreeing everywhere.

Here is the example for one farm (on four lines). The first line is the overall sum. It’s the first record on page 4515 of the PDF:

04515-00,Hortobágyi Természetvédelmi Közalap. 4063 Nagymacs Kastélykert utca 41/B,,,overall,5981187
04515-00,,Területalapú támogatás,EMGA,EU,3972963
04515-00,,EMVA – Kedvezőtlen adottságú területek,EMVA,EU,1606579
04515-00,,EMVA – Kedvezőtlen adottságú területek,EMVA,national,401645

The addresses are not clean (but there are no repeats) owing to my having no idea what a Hungarian address is supposed to look like. I’ll leave it to some Hungarian coder who knows Hungarian to work out how to match it against some phone-book location database and take this data further.

The total EU payments is 204,437,241,549 Ft. (There are 295 forints to the euro.) The total national payments is 104,134,982,548 Ft. In the list are 203,975 farms of which 29 have a zero subsidy and 236 have a negative subsidy.

The top 5 farms (and their funding) are 07518-02 (2,318,593,508), 01389-02 (1,392,871,559), 02734-01 (977,350,267), 04515-01 (906,917,938), so that’s a few more CAP millionaire there.

The most negatively funded farm is 09237-04 (-5,820,172).

todo: upload the 380 line python program here.

During the press conference, some farm lobbyist made a good show of trying to excuse Germany’s lack of publication (it’s complicated, there are legal issues with the constitution) but the representative of the Commission corrected him (“it’s very simple: they agreed to publish it by last week — and they didn’t”).

After the press conference, I spoke to a Hungarian reporter who said how they had cut-and-paste and reformatted by hand the first few pages of the document and calculated they could do it all in 11 hours (plus major RSI to the wrist and the brain).

One of the Danes also kept wanting to know how I made any money. The answer is, I don’t. But that’s not an excuse. Not when across the road from the Commission a huge system of marquees had been set up in honour of the first European SME week whose contents made absolutely no sense whatsoever.

What can you do in a world where there’s loads of money available to create rubbish, but no one wants to pay to clean it up?


