Code beats Bureaucracy: Tax Form Automation With Ruby and FDF

The City of Kettering decided to tell me they wanted my Schedule E’s from 2007 to 2012 and to fill out an income tax return for each of these years. We have a rental house there, and had no idea we needed to file a local tax return. I hate manual data entry and wanted to fill out my forms using ruby and pdftk. Yes, this is rube goldberg at its finest, but I work a lot with PDFs and wanted to learn how to do this quickly. I’ve decided that PDF programmatic management is one of those modern skills like typing that I need to master, and I’ve already made an investment in Ruby. (Just learning to use the python script PDFconcat is a great lesson in how a little learning can save a lot of time.)

I started with (random) data in this form, which represents a yearly loss on my rental house. I was able to pull up my schedule E’s since we have been paperless since 2002. I use yep to assign tags for all my files so I could pull them up quickly. Data below is made up, but in the same format as the real data.

2007|10
2008|12
2009|22
2010|20
2011|107
2012|388

And I need to populate that in [this form](wget http://dev.ci.kettering.oh.us/wp-content/uploads/2013/06/TAX_2013-Kettering-Individual-Return-No-Dates.pdf)

wget http://dev.ci.kettering.oh.us/wp-content/uploads/2013/06/TAX_2013-Kettering-Individual-Return-No-Dates.pdf

Here is a log of my attempt (in order to keep me focused on this and do it as fast as possible).

Start: 14:44 on Sunday PM

Several google queries — identified that I wanted to use pdftk and nguyen, a very lightweight library that fill PDF forms using XFDF/FDF with pdftk.

I had to install an older version of ruby (1.9.3-p448) and then clone the repo:

rvm install ruby-1.9.3-p448
git clone git@github.com:joneslee85/nguyen.git

14:54

Wow, the form is done pretty crappily:

irb(main):002:0> require '../../lib/nguyen'
=> true
irb(main):003:0> p = Nguyen::PdftkWrapper.new 'pdftk'
=> #<Nguyen::PdftkWrapper:0x007fa72d88def8 @pdftk="pdftk", @options={}>
irb(main):005:0> d = Nguyen::Pdf.new('tax.pdf', p)
=> #<Nguyen::Pdf:0x007fa72b126928 @path="tax.pdf", @pdftk=#<Nguyen::PdftkWrapper:0x007fa72d88def8 @pdftk="pdftk", @options={}>>
irb(main):006:0> d.fields
=> ["Occupation", "Occupation_2", "undefined", "undefined_2", "undefined_3", "undefined_4", "undefined_6", "undefined_7", "undefined_8", "undefined_9", "undefined_10", "undefined_11", "undefined_12", "undefined_14", "undefined_15", "undefined_16", "undefined_17", "undefined_18", "undefined_19", "Date", "Date_2", "Date_3", "undefined_21", "undefined_22", "undefined_23", "NAME_2", "ADDRESS", "ADDRESS_2", "undefined_24", "AMOUNTA", "AMOUNTB", "undefined_25", "undefined_26", "undefined_27", "undefined_28", "undefined_29", "undefined_30", "undefined_31", "undefined_32", "undefined_33", "Address", "l100", "l101", "l102", "l103", "l105", "l106", "undefined_5", "t101", "t102", "t103", "t104", "NAME", "t105", "t106", "t107", "t108", "t109", "t110", "t111", "t112", "l200", "l201", "l202", "l203", "t113", "t114", "cb1", "cb2", "cb3", "cb4", "t1", "undefined_13", "l1", "l104", "b1", "b2"]

15:02

Boom! You can figure out acrobat form names through Forms -> Edit. Looking at this, I now feel good about writing a script because there is so much duplication. Here is a list of the fields I need to fill (dummy data below):

  • “TAX YEAR” -> current_year
  • cb2 -> true
  • t1 -> “Not aware”
  • cb3 -> true
  • Address -> “123 Main Street, Alexandria, VA 22304”
  • l100 -> “123-45-1111”
  • Occupation -> “USAF”
  • “City of Income” -> “Alexandria, VA”
  • l101 -> “245-28-2822”
  • Occupation_2 -> “Physical Therapist”
  • City of Income_2 -> “Alexandria, VA”
  • “Phone Number” -> “571-281-2822”
  • “Email Address” -> “foo@bar.com”
  • “Old Address” -> old_address
  • “undefined_4” -> amount_of_loss
  • undefined_5 -> amount_of_loss
  • l102 -> 0
  • undefined_10 -> 0
  • undefined_11 -> 0
  • l103 -> 0
  • l106 -> 0
  • Date -> Date.now()
  • Date_2 -> Date.now()
  • NAME -> “Kettering Rental House”
  • t105 -> old_address
  • t106 -> “Kettering, OH 45202”
  • l200 -> amount_of_loss
  • undefined_24 -> amount_of_loss

15:16 starting to write test code

15:20 this code works, starting on real code

15:48 20 minute break for lunch and play with kids

16:20 frustrated — can’t get ruby syntax to work with here doc

This was just silly. I should have known how to load an array of text . . .

16:30 all working — printing forms with this code

Pretty cool.

Analysis of The Boohers’ Energy Usage

Using my Vera Lite and Z-wave home energy monitor I was able to record a week’s worth of electricity consumption at 30 second intervals. I was surprised how hard it was to use the data to do anything, but amazed with how my behavior changed when I knew our household consumption was being tracked. Again, if you want to improve something, I feel it needs a tracking system. Electricity usage is a nice case study for this, because it has a clean metric (wattage) over a time series.

Continue reading “Analysis of The Boohers’ Energy Usage”

Home Energy Usage

I decided to look at my home energy usage. Below are notes on how I’m approaching it. I wish I could get insight from the MIOS graphing plugin, but it is pretty basic and can’t throw out outliers and has terrible zoom capability. I also never figured out how to set the y-axis.

In order to build a plot, I considered the following options:

  • Use a custom javascript library that has zoom capability (i.e. re-purpose a stock chart)
  • Use MATLAB

I had to use MATLAB (or similar) to condition the data in any case. First, I had to scp the data over from my Vera. For example:

scp remote@micasa:/dataMine/database/10/raw/2302.txt tmp/10_2.txt

Which I was able to pull together using some MATLAB:

Since I monitor both legs of my electrical setup, I would have two plots. Here is what my MATLAB code currently produces (click on the plot below to do some analysis):

Booher Home Energy Use From 9 to 14 Feb
Booher Home Energy Use From 9 to 14 Feb

I’m not happy with this plot. It is hard to get at the events that are happening. I can’t easily put vertical bands on because the units are all strange with the time series plot. I definitely can’t browse the data and get insight for what is happening. I also want to add both datasets together so I can see total energy consumption.

Additional data from Dominion Power

Also, as part of my analysis. I pulled together some historical data from Dominion.

Energy Usage from Dominion Power
Energy Usage from Dominion Power

Actual Data

Meter Read Date Days Usage Daily Usage
02/04/2014 29 1240 43
01/06/2014 33 1060 32
12/04/2013 34 1271 37
10/31/2013 29 1144 39
10/02/2013 29 1181 41
09/03/2013 32 1523 48
08/02/2013 28 1342 48
07/05/2013 30 1608 54
06/05/2013 34 1318 39
05/02/2013 28 985 35
04/04/2013 30 1061 35
03/05/2013 29 1161 40
02/04/2013 31 1139 37

Table 2 — From Dominion Power

Meter Read Date Days Meter Reading Method Meter Read Usage (kWh) Demand Avg. Daily Usage
01/06/2014 33     6929     1060 0.0       32
12/04/2013 34     5869     1271 0.0       37
10/31/2013 29     4598     1144 0.0       39
10/02/2013 29     3454     1181 0.0       41
09/03/2013 32     2273     1523 0.0       48
08/02/2013 28      750     1342 0.0       48
07/05/2013 30 AMR – MOBILE READ BY VAN  19247     1608 0.0       54
06/05/2013 34 AMR – MOBILE READ BY VAN  17639     1318 0.0       39
05/02/2013 28 AMR – MOBILE READ BY VAN  16321      985 0.0       35
04/04/2013 30 AMR – MOBILE READ BY VAN  15336     1061 0.0       35
03/05/2013 29 AMR – MOBILE READ BY VAN  14275     1161 0.0       40
02/04/2013 31 AMR – MOBILE READ BY VAN  13114     1139 0.0       37
01/04/2013 32 AMR – MOBILE READ BY VAN  11975     1341 0.0       42
Totals
     
16,134
 

Some links:

I’m out of time this morning, but when I get more time, I’m going to be considering the following:

OFX for USAA via Ruby

My wife and I have been through roughly 10-15 different budget/financial tracking systems. We started with every penny in MS Money, used several different spreadsheets, spent several years in Mint and have pretty much dropped all of that for a top-down strategy that has us budgeting savings, non-discretionary spending, and a rainy day buffer and arriving at a fixed weekly budget for groceries, clothes, snacks, eating out and random household supplies. We use a debit card for this, and transfer the allotted amount every Thursday into the daily spending account. The problem is that we started pushing money into the account whenever it runs low, and we end up losing our focus and even the ability to track how much we spend in a given week. In an audit of last year’s spending, it was surprising to see that we were routinely 100{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} over our budget when we looked at other spending sources.

Since I code web applications, I decided to play with bringing in some of the data we create, both household and financial to ultimately create a personal dashboard for our family. In doing so, we aren’t locked into any one system and we can create something custom that works for us. This way, we can track our fitness, finances, journal and home systems all in one place and own the data and experience. One lesson learned is that our tracking systems need to be on autopilot as our different interests surge. A fragile system doesn’t work. Our needs will vary, but we want any tracking system to be able to produce a report on request.

While fun and useful, this takes familiarity with some new protocols (OXF for finance and LUUP for home automation). On a plane flight to Las Vegas, I was able to get OFX to successfully connect to USAA. First I had to set a module with USAA’s specifics:

With this in place, I can generate a valid OFX request:

This request passes all the assertions designed to test for a valid signon response:

def verify_usaa_signon_response(response_document)
        signon_message = response_document.message_sets[0]
        assert signon_message.kind_of?(OFX::SignonMessageSet)
        assert_equal(1, signon_message.responses.length)

        signon_response = signon_message.responses[0]
        assert signon_response.kind_of?(OFX::SignonResponse)
        assert_not_equal(nil, signon_response.status)
        assert signon_response.status.kind_of?(OFX::Information)
        assert signon_response.status.kind_of?(OFX::Success)
        assert_equal(0, signon_response.status.code)
        assert_equal(:information, signon_response.status.severity)
        assert_not_equal(nil, signon_response.status.message)
        assert_not_equal(nil, signon_response.date)
        assert_equal(nil, signon_response.user_key)
        assert_equal('ENG', signon_response.language)
        #assert_not_equal(nil, signon_response.date_of_last_profile_update)
        #assert_not_equal(nil, signon_response.date_of_last_account_update)
        assert_not_equal(nil, signon_response.financial_institution_identification)
        assert_equal('USAA', signon_response.financial_institution_identification.organization)
        assert_equal('24591', signon_response.financial_institution_identification.financial_institution_identifier)
        assert_equal(nil, signon_response.session_cookie)
    end

One of the difficult parts was to determine the required length of my account number in the absence of documentation. It took some experimentation to find out that USAA wants exactly nine digits for the username (member number) and ten digits for an account number. Instead of making code that robustly input padded zeros (through sprintf or similiar), I just changed the input values.

I also noticed that USAA did have

<LEDGERBAL><BALAMT>290.51<DTASOF>20140211120000</LEDGERBAL></STMTRS>

, but did not have the available balance fields that the gem expected. In any case, I can now get transactions and full access to my bank programmatically, which is pretty cool.

Links:

Setting up the Aeon Labs Aeotec Z-Wave Smart Energy Monitor

I struggled for awhile trying to set up the Aeon Labs Aeotec Z-Wave Smart Energy Monitor to monitor my electricity. The manual or any instructions were difficult to find online.

The first article that was absolutely necessary explained how to pair the device. After reading this article, pairing was pretty trivial.

Great details in the developer’s manual

part number: DSB09104-ZWUS

the manufacturer is also marginally useful.

the ‘manual’

the amazon page

Key advise is to wait after installation. I can’t get anything from Watts, but I can read each clamp regularly. While I look into this later, you can still see what is going on:

Screenshot 2014-02-10 07.09.37

Household temperature and set point for the last week
Household temperature and set point for the last week

MiCasa Verde (MiOS) DataMine Logging

For my home automation goals, I chose MiCasaVerde VeraLite due to my friend’s recommendation. The VeraLite is a small linux controller that runs the MIDIbox Operating System MiOS and gives a homeowner the ability to easily control lights, security cameras, door locks, alarm systems, and even the thermostat, among many other home systems. For example, you could set your temperature from your mobile device or web browser. I chose this device because of my friend’s experience and the Vera’s compatibly with INSTEON, Z-Wave, and X10 devices and I had previously made a big investment in X10.

Setup was easy and everything worked well with my 2gig CT100 Z-Wave Programmable Thermostat after I completed my initial pairing.

I was happy to see the dataMine Graphing and Logging plugin for my micasaverde veralite, but after mounting my usb, I still couldn’t see any channels.

MiCasaVerde Screenshot
MiCasaVerde Screenshot

First thing was to get SSH access to my account so I could do some troubleshooting. To do that, I had to go to account, tech support and enable remote access. If you do that, you get a message like this:

Tech support full control enabled, access code 34014212-123456 (SSH: SSH_22=27032 TS_SRV=ts2)

And your password is 123456 for the user “remote”. In order to get the root password you need to run:

nvram show | grep pass

You’ll see something like this:

vera_wifipass=shade83forest

So in this case, shade83forest, is the actual root password.1

In my ssh shell, I was able to see that my mount was correct and available:

/dev/sda1               506.2M     16.5M    464.0M   3{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} /dataMine

Through using the file command I could tell this was an ext3 partition:

/dev/sda1: Linux rev 1.0 ext3 filesystem data (needs journal recovery) (large files)

I also could get any details on my MiOS version:

Linux MiOS_35017272 2.6.37.1 #2 Fri Feb 22 04:07:32 PST 2013 mips GNU/Linux

After installing, your /dataMine/ directory should be mounted and have the following files:

/dataMine# ls -l
-rw-r--r--    1 root     root           252 Jan 26 07:38 InternetOk.log
-rw-r--r--    1 root     root         27646 Jan 26 07:48 LuaUPnP.log
-rw-r--r--    1 root     root             0 Jan 26 07:38 NetworkMonitor.log
-rw-r--r--    1 root     root          2112 Jan 29 06:37 Notifications [R2299].txt
-rw-r--r--    1 root     root           468 Feb  1 06:20 Notifications [R2300].txt
-rw-r--r--    1 root     root           150 Feb  1 06:20 dataMineConfig.json
drwxr-xr-x    2 root     root          4096 Jan 26 07:55 database
drwx------    2 root     root         16384 Jan 26 07:37 lost+found
-rw-r--r--    1 root     root            26 Jan 26 07:38 mount_tests
-rw-r--r--    1 root     root          2110 Jan 26 07:38 serproxy.log
-rw-r--r--    1 root     root             0 Jan 26 07:48 signal.flag.log
-rw-r--r--    1 root     root            50 Jan 26 07:48 signal.log
-rw-r--r--    1 root     root           414 Feb  1 07:15 sunriseSunset.txt

If I request debugging information (via /port_3480/data_request?id=lr_dmCtrl&control=debug), I get:

First I found the variable I wanted (current temperature) and enabled logging:

Current Settings for HVAC
Current Settings for HVAC

Initially, you don’t get good feedback. I thought I was getting an error and couldn’t get to any logs (see the red exclamation mark where the results should be)

Error
Error

But that should change once events occur and the logging starts.

My debug file has provided the following information:

{"Version":"0.980","dbVersion":2,"Events":{"count":17,"last":1391253610},"guiConfig":[],"Variables":[{"Service":"urn:upnp-org:serviceId:TemperatureSensor1","LastRec":0,"FilterMaximum":0,"Type":0,"Logging":1,"FilterEnable":0,"Device":4,"FilterMinimum":0,"Id":1,"DrowsyWarning":0,"DrowsyError":0,"DataOffset":0,"Name":"CurrentTemp","LastVal":0,"FirstRec":0,"Variable":"CurrentTemperature","DataType":1}],"Graphs":[],"LastWrite":1391261432,"nextId":2}

One problem was that I was getting lots of errors like:

50  02/01/14 12:08:54.833   luup_log:11: dataMine: 1:Unable to open file for read - /dataMine/database/4/raw/2300.txt <0x2d6fa680>

I fixed this this through a little shell script:

cd /dataMine
mkdir database
cd database
for x in 1 2 3 4 5 6 7 8 9
do
mkdir $x
mkdir $x/raw
for y in 0 1 2 3 4 5 6 7 8 9
do
mkdir $x$y
mkdir $x$y/raw
done
done

Here are some initial questions I had:

  • How do I change update frequency? (Answer: You don’t, you record on state changes.)
  • How can I export? (Answer: You ssh to the data directories, and pull out the raw data.)
  • How can I match the scales of two different variables? (This one I don’t know.)

It seems all the data are stored in this structure:

root@MiOS_35017272:/dataMine/database# find . -iname "*.txt"
./4/raw/2300.txt
./1/raw/2300.txt
./3/raw/2300.txt

Pretty basic, even though the data are very straightforward (unix time stamp and value):

root@MiOS_35017272:/dataMine/database# more ./3/raw/2300.txt
1391275155,64

Now, I can make these plots:

Display from dataMine app
Display from dataMine app

This is direct from the dataMine app, which has really great browsing capability, but I can’t get the axes to match.

Custom Plot (using MATLAB)
Custom Plot (using MATLAB)

Here is a much better plot with a common axis and appropriate scaling. It is interesting how slowly my house cooled. I need to compare this with local temperatures.

Any thoughts appreciated in the comments.

Resources