Autoscoring with Matlab

While I qualified a long time ago on the M9, I never really learned to be a good shot. Now, I’m trying to learn to shoot well and wanted to automatically score my targets, keep the data, and get better.

There are apps to do this, but none of them did what I wanted. One app by Thomas Gabrowski and Justa Mili works off a photo of the target to automatically calculate the score. They also have the capability to analyze shooting groups with Windage, Elevation, Mean Radius and Extreme Spread. They have capabilities to keep track of your previous shooting sessions and monitor progress. The App costs $17.

Developing my own costs my time, but gives me flexibility to work with my targets and my system. It’s also a thing I do: grab challenges that will teach me something. It’s served me well and Matlab makes this all accessible and fun. The other thing is that apps never work exactly right. What if I want the raw data so I can calculate if I’m aiming high or low over time? All this code is on github at https://github.com/tbbooher/automatic_target_scoring.

I told my children that two key skills every digital citizen needs are the ability to process text and images. By processing text, I’m able to tell any story from any reports in digital form. This is often bank statements, hotel stays, fitness stuff or uber rides. By processing images, I’m able to understand and report on things that are happening around me.

In looking around, I found this thesis filled with some good ideas. I reached out to the author and discussed the merits of edge detection vs template matching. He didn’t have his code available. There were several papers but none were really that helpful. It was easier to start building than to spend a lot of time reading other’s approaches.

I knew there would be three steps to this: (1) registering all images to the standard, fixed, image for consistent distance, (2) finding the bullet holes/center and (3) measuring the distances from the center each hole.

Image Registration

This was harder than I thought since most registration is for two similar images. I was used to the ease of Photoshop for rapid registration. It turns out it is a hard problem to register images of different pictures of what are really different scenes, even though the structure is common. Most image registration problems are pictures of the same scene that have been taken at different angles or distances. The picture below makes this clear:

Reference and Real Image

I found two approaches that worked for image registration. The first approach was to extract the red circle and then make the circles match. Here I had to calculate and align the centers, and rescale one image to the size of the other. Color thresholding and imfindcircle were quite useful.

For the more general case, I had to use fitgeotrans which takes the pairs of control points, movingPoints and fixedPoints, and uses them to infer the geometric transformation. It does this by taking the pairs of control points, movingPoints and fixedPoints, and uses them to infer the geometric transformation. After doing this I had a set of images that were all the same size, and all in the same orientation — with bullet holes.

Registered Images

Finding the bullet holes

I was able to use this matlab post to learn that I could sample some colors in photoshop, convert the image to HSV and find shades of gray using some code from Theodoros Giannakopoulos.

The next thing I had to do was create the ability to find the center. I did this by recognizing that the center X is red and pretty distinctive — ideal for template matching using normalized cross-correlation matlab has a great description of how this works here. With this accomplished, I can find the center in a few lines, by going off this template:

Template

All together, I’m able to compute the measurements to make a picture like this (note the green circle in the middle on the X):

Result

With the image registered, the center defined and all holes discovered, I could easily calculate a score of a mean distance to the bullseye.

Problems

The problem was that I couldn’t get good consistency. The shadows were a problem on some images, on others, shots very close to one another caused confusion. It turned out that I was really good at quickly seeing the holes, better than a template matching problem. Note that when I saved the image, I updated a xls file and saved the scores as EXIF data so the image had the exact locations of the holes that I could pull out later if needed. The code below works awesome and is ideal for my solution. Best of all, I learned a lot about how to manipulate and extract data from images.

Results

So, is my shooting getting better? Not yet. In the plot below you can see my score is increasing, and the stDev of my shots is increasing as well. Now, the data aren’t uniform since I had the target at 5m and now have it at 6.5m on Oct 8. Sept 12 was with a suppressed 22 at 5m. Oct 8 was 9mm. Anyway, it’s better to know from data than to be guessing. I’m chalking this up to an improved technique that is taking some time to adjust to.

Nginx With Https for Local Development

Testing HTTPs locally is always hard, but I’m against testing on production or even on a remote server.
Things are also complicated by developing in linux as a subsystem on windows via WSL2. I was able to use mkcert to get ssl to work locally.
While I would love to use Let’s Encrypt locally, Let’s Encrypt can’t provide certificates for “localhost” because nobody uniquely owns it. Because of this, they recommend you generate your own certificate, either self-signed or signed by a local root, and trust it in your operating system’s trust store. Then use that certificate in your local web server. They describe this well on their website.

Using certificates from real certificate authorities (CAs) for development can be dangerous or impossible (for hosts like example.test, localhost or 127.0.0.1), but self-signed certificates cause trust errors. Managing my own CA may be the best solution, but mkcert automatically creates and installs a local CA in the system root store, and generates locally-trusted certificates. I was able to modify my nginx.conf with the my container test environment and open the necessary ports in docker-compose (- 443:443) to get this working just fine.
You can see my working code here on a new git branch.

upstream flask-web {
server flask:5000;
}

upstream lochagus-web {
server lochagus:8080;
}

server {
listen 80;
listen [::]:80;

location / {
root /usr/share/nginx/html/;
try_files $uri /index.html;
}

charset utf-8;
source_charset utf-8;

location /flask {
include /etc/nginx/conf.d/headers.conf;
proxy_pass http://flask-web/;
}

location /lochagus {
include /etc/nginx/conf.d/headers.conf;
proxy_pass http://lochagus-web/;
}

}

server {
listen 443 ssl http2;
listen [::]:443 ssl http2;
server_name tim.test.org;

charset utf-8;
source_charset utf-8;

location / {
root /usr/share/nginx/html/;
try_files $uri /index.html;
}

location /flask {
include /etc/nginx/conf.d/headers.conf;
proxy_pass http://flask-web/;
}

location /lochagus {
include /etc/nginx/conf.d/headers.conf;
proxy_pass http://lochagus-web/;
}

ssl_certificate /etc/nginx/certs/tim.test.org.pem;
ssl_certificate_key /etc/nginx/certs/tim.test.org-key.pem;

}

Organize Images

I deleted and recovered a large number of photos (thanks foremost). However, a large number of images from a browser cache was messing up my photo library.

Who wants to keep 10,000 of these images stored on my harddrive?

I used the number of unique colors to filter them out via MATLAB.

As expected, if I plot this, I get a nice s-curve:

plot(sort(nonzeros(ucolors)))

Or, viewing this as a histogram

histogram(nonzeros(ucolors),100)

So I modified the script to move all images with unique color counts less than 4500 unique colors to a folder: potential bad

This worked perfectly and saved my whole Saturday.

DEFCON 25 Writeup

Nothing delivers the fear of missing out like DEFCON. So much to learn and so little time. Too busy hacking in the IOT village or catching up with old friends to sit in on the talks? Read below to find out what you missed in my talk writeups below. They are listed in chronological order, with some of the most interesting at the end of the document, so don’t give up too easily.

Overall, conference attendance was at a record 27,000 people, most of them non-technical fans of hacking culture. Major themes continued a previous trend of focus on IOT, Drones, and low-level tools and vulnerabilities. New additions this year were a focus on analog techniques to both vet and attack low-level hardware. Several talks leveraged the increased performance and lower cost of commercial software defined radios.

The largest media coverage was focused on vulnerabilities to voting machines and the ability to use cheap components to crack physical protections.

All speakers and talk abstracts are listed here and most presentations can be found on the DEFCON media server. In roughly a month, all of these talks will be available on YouTube.

DEFCON Capture the Flag

The Capture the Flag (CTF) competition was won by CMU’s PPP team coached by David Brumley. This team won the last three years of the competition and several before that as well, and a majority of the PPP team members were on the winning DARPA Cyber Grand Challenge team as part of David Brumley’s company For All Secure.

DARPA’s Dustin Fraze ran this year’s competition, which was specifically designed to thwart any attempts to start the competition with a head start from a cyber reasoning system such as those developed for the DARPA Cyber Grand Challenge. In order to do this, the organizers developed a radically new architecture with several challenging properties:

  • 9-bit bytes instead of 8-bits. This makes parsing the binary difficult. The byte length of the architecture of the system parsing a challenge does not match that in cLEMENCy. The start of a byte on both systems would only match every 9th byte.
  • It’s Middle Endian. Every other architecture stores values in memory in one of two ways: from most significant byte to least significant (Big Endian), or least significant to most significant (Little Endian). Rather than storing a value like 0x123456 as 12 34 56 or 56 34 12, Middle Endian stores it as 34 56 12.
  • Instructions have variable length opcodes. Instructions were anywhere from 18 to 54 bits, with opcodes being anywhere from 4 bits to 18 bits.

Thursday:

Porosity: A Decompiler For Blockchain-Based Smart Contracts Bytecode – Matt Suiche

With the knowledge that implementation is always the most common cryptographic weakness, Matt described how to decompile and comprehend Ethereum smart contracts. Ethereum is an open software platform based on blockchain technology that enables developers to build and deploy decentralized applications, in this case smart contracts. The hacks of these contracts have been popular due to flawed implementations. The 2016 DAO (Decentralized Autonomous Organization) theft or the recent Parity multisignature blockchain wallet compromise resulted because of poorly written Solidity code (contract-oriented, high-level language) that introduced vulnerabilities which hackers exploited to steal funds from other Ethereum users, not because of compromises of the underlying blockchain protocol or cryptographic weakness.

In his talk, Matt presented “Porosity,” the first decompiler that generates human-readable Solidity syntax smart contracts from any EVM bytecode. Because compiled smart contracts are all world visible on the Ethereum network, this tool enables all contracts to be reviewed at any time. Once reversed, the code can be scanned to check for susceptibility to new attacks or to ensure adherence to best practices.

You can read his paper here for more details.

See No Evil, Hear No Evil: Hacking Invisibly and Silently With Light and Sound – Matt Wixey

Matt Wixey showed how attackers creatively bypass modern security protections through leveraging light and/or sound, using off-the-shelf hardware. His talk covered C2 channels and exfiltration using light and near-ultrasonic sound, to disable and disrupt motion detectors; he demonstrated use of home-built laser microphones, a technique for catapulting drones into the air and overriding operator controls, jamming speech through the use of delayed auditory feedback, and demotivating malware analysts. Specifically, he demonstrated:

  • Using acoustic ultrasonic pulses to disrupt a drone and render the controller useless by characterizing and sending specific inputs to ultrasonic altimeters. He demonstrated this over a large area through a cluster of ultrasonic transducers.
  • He displayed a demonstration of a covert acoustical mesh network at 16-20KhZ using steganography to conceal the transmission (he played the acoustic attack signal inside seemingly normal audio).
  • He was able to open a computer application by shining infrared light into its ambient light sensor near the webcam which triggered activation of pre-installed malware
  • He played music from a smartphone into a speaker without Bluetooth and without wires connecting the two devices, by using infrared LEDs within close proximity
  • Matt also demonstrated the ability to shut off his TV, when the theme song of Gilmore Girls played in his apartment.

His talk is available here.

Friday:

The Brain’s Last Stand – Garry Kasparov

Kasparov discussed the ways humans and intelligent machines could work together. He described that AI will not destroy jobs, but will make new ones. He also talked about the respective contributions of machines and people. The general consensus of the crowd was that he succeeded in providing a keynote that was both funny and insightful.

Weaponizing the BBC Micro:Bit – Damien Cauquil

The BBC has produced a Micro:bit, a pocket-sized computer for education. Damien showed that the Micro:bit can be configured to sniff out keystrokes from a wireless keyboard, and even take control of a quadcopter drone. He demonstrated using publicly available software, and a Micro:bit, the ability to snoop on signals and get the input from a wireless keyboard using Bluetooth. He also attached a Micro:bit to a drone controller handset and used the resulting hardware to interfere with an airborne quadcopter’s control mechanisms and hijack its flight controls.

His slides are available.

Hacking Smart Contracts – Konstantinos Karagiannis

While the Ethereum cryptocurrency core system remains secure, there have been a number of hacks of the Ethereum system. The most famous attack was against an smart contracts wallet, called the Parity Ethereum client. The vulnerability allowed the hacker to exfiltrate funds from multi-signature wallets created with Parity clients 1.5 and later. Konstantinos described the hack as shockingly basic: “It turns out that the creator [of the wallet] left out a very important word,” Karagiannis said. “And that one little word was the word ‘internal.’ When you don’t make that declaration, the program will accept what are called messages from an external source. So because he left off that simple declaration, attackers were able to craft a very simple message that told the wallet, ‘Hey, you belong to me now.’ After that was done, they were then able to do an execute command that then told the wallet, ‘And, by the way, I’d like that money sent here.’ So it was a really simple two-stage attack that should not have happened.”

This was a case where white hat hackers were clear heroes. He described anonymous vigilantes calling themselves the White Hat Group intervened, who stole Ether out of wallets before the hackers could get to it. Later, this group transferred the Ether back to its owners, saving about $200 million.

See the slides here.

Open Source Safe Cracking Robots – Nathan Seidle

One of the most well-received talks was by Nathan Seidle, founder of SparkFun Electronics. He demonstrated an open source safe-cracking robot. Using an Arduino and 3D printing, the safe-cracking robot cost around $200 to build. It uses magnets to attach to the exterior of a safe and is super simple to run with only one button.

They accomplished this by determining one of the dials within 20 seconds by detecting the larger size of the correct indent. The other two dials couldn’t be measured directly, but they were helped by finding that most safes have a built in margin of error. This enabled them to dramatically reduce the number of potential combinations.

See the slides or watch a video from Wired magazine.

Next-Generation Tor Onion Services – Roger Dingledine

Roger argued that a tiny fraction of Tor traffic makes up what is often called the “dark web”. He introduced the concept of “Tor onion services” which let people run internet services such as websites inside the anonymous Tor network.

He discussed how mistakes in the original protocol are now being actively exploited by threat intelligence companies to build lists of onion services even when the service operators thought they would stay under the radar. To fix this, he presented a new and improved onion service design, which provides stronger security and better scalability. For example, they are migrating from 1024-bit RSA encryption keys to shorter but tougher-to-crack ED-25519 elliptic curve keys.

He predicts that soon anyone will be able to create their own corner of the internet that’s not just anonymous and untraceable, but entirely undiscoverable without an invite. In particular, the new design will both strengthen the cryptosystem and let administrators easily create fully secret darknet sites that can only be discovered by those who know a long string of random characters. This is especially important due to the arrival of a large number of darknet indexing and search services.

These services will work by changing the basic mechanism of service declaration. Instead of declaring their .onion address to hidden service directories, they’ll derive a unique cryptographic key from that address which is discreetly stored in Tor’s hidden service directories. Any Tor user looking for a certain hidden service can perform that same derivation to check the key and route themselves to the correct darknet site. But the hidden service directory can’t derive the .onion address from the key. This means there is no known way to discover an onion address.

Applications for this include the ability for a small group of collaborators to host files on a computer known only to them that no one else could ever find or access. Also present are mitigations against the techniques used by law enforcement to find and remove the silk road servers in 2014.

Slides are available.

How We Created the First SHA-1 Collision and What it means For Hash Security – Elie Bursztein

The SHA1 cryptographic hash function is critical for the many of the cryptographic and integrity verifications that enable our current state of internet security. For example, Git, the world’s most widely used system for managing software development among multiple people, relies on it for data integrity. The basic security property of hash values is that they uniquely represent a given input, when it is possible to break this property on demand, a cash collision occurs. This February, the first SHA-1 collision was announced. This collision combined with a clever use of the PDF format allowed attackers to forge PDF pairs that have identical SHA-1 hashes and yet display different content. In the talk, Elie made it clear how difficult this was to achieve. This attack is the result of over two years of intense research. It took 6500 CPU years and 110 GPU years of computations to find the collision.

They discussed the challenges faced from developing a meaningful payload, to scaling the computation to that massive scale, to solving unexpected cryptanalytic challenges. They were able to discuss positive aspects of the release and presented the next generation of hash functions and what the future of hash security holds.

For more information, refer to this article from the register. No slides were available.

Breaking the x86 Instruction Set – Christopher Domas

Chris demonstrated how page fault analysis and some creative search strategies can be used to exhaustively search the x86 instruction set and uncover new secrets. In particular, I was impressed by his use of page boundaries to determine lengths of unknown instructions. He disclosed new x86 hardware glitches, previously unknown machine instructions, ubiquitous software bugs, and flaws in enterprise hypervisors. He also released his “sandsifter” toolset, that allows users to audit – and break – your own processor.

While he discovered a large number of hidden instructions, I think the really interesting work resulting from this will be from exploits developed from unknown opcodes. For example, an attacker can use these to mask malicious behavior. They could throw off disassembly and jump targets to cause analysis tools to miss the real behavior.

What worries me most is that hardware bugs are foundational and manifest themselves in the dependent software stack. Additionally, these are difficult to find and fix. For example, he presented a ring 3 processor DoS that I think would be difficult to counter.

They used these techniques to find innumerable bugs in disassemblers, the most interesting is a bug shared by nearly all disassemblers. Most disassemblers will parse certain jmp (e9) and call (e8) instructions incorrectly if they are prefixed with an operand size override prefix (66) in a 64 bit executable. In particular, IDA, QEMU, gdb, objdump, valgrind, Visual Studio, and Capstone were all observed to parse this instruction differently than it actually executes.

This is written up well in a (unpublished) paper here and the slides are available as well.

Abusing Certificate Transparency Logs – Hanno Böck

Hanno started by making the point that certificate transparency logs are not generally trusted by the information security community and highlighted several cases of illegitimate certificates in the past. However, he noted that there is no feasible plan how to replace CAs and in September 2017 Google will make Certificate Transparency mandatory for all new certificates and in the future logging will be required in April 2018. However, in practice, most certificates are currently logged and the certificate transparency system provides public logs of TLS certificates.

Certificate Transparency has helped uncover various incidents in the past where certificate authorities have violated rules. It is probably one of the most important security improvements that has ever happened in the certificate authority ecosystem.

While certificate transparency is primarily used to uncover security issues in certificates, Hanno Böck showed that these data are also valuable for other use cases to include attacks such as exploiting common web applications like WordPress, Joomla or Typo3 through exploiting certificate transparency.

  • Attack 1: Attacker monitors CT logs, extracts host names. Compares web pages with common installers.
  • Attack 2: Installer found: Install the application. Upload a plugin with code execution backdoor. Revert installation, leave backdoor.

Using the second attack, he claimed he could have taken over around 4000 WordPress installations within a month.

For more details, see this blog post and these slides.

Saturday:

On Saturday, I was surprised when 22 zero-day exploits were released against a range of consumer products—mainly home automation and Internet of Things devices.

A Picture is Worth a Thousand Words, Literally Deep Neural Networks for Social Stego – Philip Tully and Michael T. Raggo

Philip Tully and Michael T. Raggo described how steganography can be systematically automated and scaled. In order to accomplish this, they first characterized the distorting side effects rendered upon images uploaded to popular social network servers such as compression, resizing, format conversion, and metadata stripping. Then, they built a convolutional neural network that learned to reverse engineer these transformations by optimizing hidden data throughput capacity. They then used pre-uploaded and downloaded image files to teach the network to locate candidate pixels that are least modifiable during transit, allowing stored hidden payloads to be reliably recalled from newly presented images. Deep learning typically requires massive training data to avoid overfitting. However massive images are available through social networks’ free image hosting services, which feature bulk uploads and downloads of thousands of images at a time per album.

From this, they demonstrated the ability to show that hidden data can be predictably transmitted through social network images with high fidelity and low latency. This is significant, because steganalysis and other defensive forensic countermeasures are notoriously difficult, and their exfiltration techniques highlight the growing threat posed by automated, AI-powered red teaming.

The slides are available as is a good write-up.

MS Just Gave the Blue Team Tactical Nukes (And How Red Teams Need To Adapt) – Chris Thompson

This talk could presage the end of the nearly $2B endpoint security market on Windows (i.e. Symantec,Carbon Black, CrowdStrike, BigFix, CyberReason, FireEye).

Windows Defender Advanced Threat Protection (ATP) is a cloud-based data security service that provides advanced breach detection based on the information gathered by Microsoft’s massive threat intelligence system. This coming spring, it’s due for a big upgrade and Microsoft has used their knowledge of and deep access into the Windows operating system to give their tools unique capabilities, such as the inability to uninstall them via microsoft enterprise admin tools. (For example, it requires a generated offboarding script with a SHA256 signed reg key based on the unique Org ID and cert to uninstall).

I also found it interesting that the ATP sensor uses Windows Telemetry (DiagTrack service), which in turn uses WinHTTP Services (winhttp.dll.mui) to report sensor data and communicate with the Windows Defender ATP cloud service. Other interesting results in this talk were a demonstrated capability to perform Golden Tickets Detection (Using KRBTGT NTLM Hash), detecting malicious replication requests via DCSync and detection of internal recon activities.

The principal program manager of Windows Defender ATP at Microsoft was quoted as saying: “Windows Creators Update improves our OS memory and kernel sensors to enable detection of attackers who are employing in-memory and kernel-level attacks—shining a light into previously dark spaces where attackers hid from conventional detection tools. . . We’ve already successfully leveraged this new technology against zero-days attacks on Windows.”

Windows Defender ATP should definitely be carefully watched and tested going forward. Out of the box, it will provide at least a new layer of security, but it also has the potential to replace expensive EDR solutions.

This article is a good summary and the slides are available.

Trojan-tolerant Hardware & Supply Chain Security in Practice – Vasilios Mavroudis and Dan Cvrcek

Vasilios Mavroudis and Dan Cvrcek challenged the perception that high-assurance systems cannot tolerate the presence of compromised hardware components and demonstrate how trusted, high-assurance hardware can be built from untrusted and potentially malicious components.

They noted that the majority of IC vendors outsource the fabrication of their designs to facilities overseas, and rely on post-fabrication tests to weed out deficient chips. However they claim that these tests are not effective against: 1) subtle unintentional errors (e.g., malfunctioning RNGs) and 2) malicious circuitry (e.g., stealthy Hardware Trojans). These errors are very hard to detect and require constant upgrades of expensive forensics equipment, which contradicts the motives of fabrication outsourcing.

To meet this challenge, they introduced a high-level architecture that can tolerate multiple, malicious hardware components, and outlined a new approach in hardware compromise risk management. They contrasted this against existing approaches such as trusted foundries (very expensive, prone to errors), split-manufacturing (still expensive, not secure), post-fabrication inspection (expensive, a huge pain, doesn’t scale), secret-sharing (keys generated by a trusted party). By contrast, their system is a fault-tolerant computer system that runs the same set of operations at the same time in parallel and incorporates random number generation, key generation & management, decryption and signing. They claim this provides resilience and is tamper-resistant (FIPS-4).

Their implementation incorporated 120 SmartCards, quorums of three cards, 1.2Mbps dedicated inter-IC buses, ARTIX FPGA controls the communications bus and 1Gbit/s bandwidth for incoming requests. Their key innovation was using SmartCards

They benchmarked the performance of this system, and described its internals. They also quantified other techniques such as “component diversification” and “non-overlapping supply chains”, and finally discussed how “mutual distrust” can be exploited to further increase trust in hardware integrity.

The slides are available.

Evading next-gen AV using A.I. – Hyrum Anderson

With most next-generation antivirus solutions relying on machine learning to identify new iterations of malware, Hyrum used artificial intelligence to construct a system that would mutate malware to the point where it wouldn’t be detected while still keeping its form and function. Essentially, he created a test in which you could pit artificial intelligence against any next-generation antivirus solution where the artificial intelligence would manipulate the malware by modifying bytes (while preserving the malware) and test it against the antivirus solution to determine if it was identified as malicious or benign. Hyrum showed in his trials that this process was successful in modifying malware to bypass detection.

The slides are available.

Sunday:

Breaking Bitcoin Hardware Wallets – Josh Datko and Chris Quartier

In this talk Josh Datko and Chris Quartier explained that bitcoin security rests entirely in the security of one’s private key. Bitcoin hardware wallets (a separate device that stores your private key) help protect against software-based attacks to recover or misuse the private key. This talk explored how one could compromise the security of these hardware wallets. In 2015, Jochen Hoenicke was able to extract the private key from a TREZOR using a simple power analysis technique. While that vulnerability was patched, they explored the potential to exploit the Microcontroller on the TREZOR and the KeepKey. They accomplished this by using an Open Source Hardware tool, the Chip Whisperer. With this device, they tried to overview fault injection techniques, timing, and power analysis methods against the STM32F205 microcontroller on the Trezor and KeepKey.

The successful technique takes advantage of the properties of the power distribution networks on printed circuit boards to generate ringing in these networks. This ringing is presumed to perturb the power distribution networks on the target chip itself, which is known to cause faulty operations. The use of fine control over the fault timing has also demonstrated that faults with very high reliability can be inserted, determining for example if a single- or multi-bit fault should be introduced, or to fault a single byte out of a larger array operation. In general a glitch width of about 750 nS would frequently (50{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} of the time) result in one or more bit flip in the register without noticeably damaging the FPGA design.

However, TREZOR (and all clones) did not enable Clock Security System in the MCU, allowing injection of clock faults and he wasn’t able to use the ability for these bit flips to compromise the keys on the device.

For more details, refer to the slides or this paper:

O’Flynn, Colin. “Fault Injection using Crowbars on Embedded Systems.” IACR Cryptology ePrint Archive 2016 (2016): 810.

Backdooring the Lottery, and Other Security Tales in Gaming over the Past 25 Years – Gus Fritschie and Evan Teitelman

In October 2015, Eddie Raymond Tipton, the information security director of the Multi-State Lottery Association (MUSL), was convicted of rigging a $14.3 million drawing of MUSL’s lottery game Hot Lotto. The trial found that Tipton had rigged the draw in question, by using his privileged access to a MUSL facility to install a rootkit on the computer containing Hot Lotto’s random number generator, and then attempting to claim a winning ticket with the rigged numbers anonymously. There is evidence that Eddie Tipton was able to steal a lot of other funds using similar techniques.

This worked by taking various game parameters including the number of numbers per draw and the maximum and minimum numbers in the game and multiplied them together along with the number 39 and the summed ASCII values of the letters in the computer name. Then he took that and added it to the day of the year and added in the product of the number of times the Pseudorandom number generator (PRNG) had run since the last reboot and the year. This number was then used to seed the PRNG. One of the clever things Tripton was able to do is ensure the certification would pass, even when they ran the output of the PRNG through statistical tests to ensure unbiased results.

The 2014 russian slot machine hack was also very interesting. Hackers reverse-engineered slot machine software and discovered a weakness in the PRNG. Their exploit against this weakness was to make a video of roughly 24 spins and to upload the resultant data to calculate the pattern based on the slots. This Information was transmitted to a custom app with a listing of timing marks that cause the mobile to vibrate 0.25 seconds before the spin button should be pressed. While this was not always successful, the result was a higher payout than expected. This is an excellent overview of this hack.

Their slides are interesting.

Genetic Diseases to Guide Digital Hacks of the Human Genome: How the Cancer Moonshot Program will Enable Almost Anyone to Crash the Operating System that Runs You or to End Civilization – John Sotos

John Sotos, the Chief Medical Officer at Intel, has a wild, scary thought experiment: What if, by investing in hacking the human genome for good, we’ve opened it up to be hacked for evil? He believes it may one day be possible to create terrifying bioweapons using genetics.

John claimed that the enabling capability for these attacks comes from data and techniques made available from the Human Genome Project and the Cancer Moonshot.

While mostly a thought experiment, John discussed the intersection between precise gene editing and digital biology (an emulative and bit accurate model of biological processes). From these, he claims it will be possible to digitally reprogram the human genome in a living human. In his mind, this raises the potential to give bio-hackers the ability to induce “deafness, blindness, night blindness, strong fishy body odor, total baldness, intractable diarrhea, massive weight gain, shouting involuntary obscenities, physical fragility, or a susceptibility to death from excitement. . .There are things worse than death”

No slides were available, but the press covered this talk extensively.

Revoke-Obfuscation: PowerShell Obfuscation Detection (And Evasion) Using Science – Daniel Bohannon and Lee Holmes

Delivery and execution of malware is often by basic malicious executables and malicious documents, but a small portion of the current malware ecosystem leverages PowerShell as part of its attack chain. Most commonly PowerShell calls an executable or document macro that launches PowerShell to download another executable and run it.

Many personal security products attempt to prevent this behavior, with two primary techniques: applying antivirus signatures to command line arguments and AMSI-based (Anti-malware Scan Interface) detection. But obfuscation and evasion techniques can bypass these protections. There is an inherent challenge to allowing powerful administrative behavior through an interface like powershell and preventing these tools from conducting malicious behavior.

While starting with numerous examples of how hard it is to block illicit use of PowerShell, this talk focused on how to detect invoke-obfuscation which can bypass both approaches. These advanced techniques leverage extreme levels of randomization in Invoke-Obfuscation and Invoke-CradleCrafter paired with the token-layer obfuscation options that are not deobfuscated in PowerShell’s script block logging. To counter these advanced techniques, they built a tool, called revoke-obfuscation, which uses statistical analysis to detect the most subtle obfuscation techniques. The heart of the technical capability is to use statistical tools to analyze the entropy and letter frequency distribution of the variable names.

One of the clever techniques they employ is to use cosine similarity to analyze the frequency of each letter in a script. However, this naive technique lacked the advantages of unsupervised approaches and statistical variance. Additionally, it was susceptible to character frequency tampering. To counter this, they downloaded 408,000 PowerShell scripts and modules. They then were able to apply logistic regression with gradient descent to calculate and summarize 4098 unique script characteristics to directly identify the likelihood that a script is obfuscated. Their results were fairly impressive with an accuracy and F1 score of 96{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} and 95{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4}, respectively. This was 10 times better at finding obfuscated content than character frequency analysis alone, and has half of the false positives.

More details are available in this excellent report.

Common Mathmatical Libraries: Computation, History, and Trust

A common complaint among engineers is that modern computers obscure the details of the complicated math they produce. We cringe when children are exalted as “knowing computers” from a young age. Bill Gosper and Richard Greenblatt had perhaps the maximum benefit to “learn computers”. They didn’t have wolfram alpha or even spreadsheets, they had to understand arithmetic logic units and figure out how to optimize code to get basic math operations to run on the TRS-80. Today, there is a desire to re-learn hardware through arduino or other low-level hardware, but most folks start very high on the stack. That is a shame, because under the hood is a rich and exciting legacy of numerical computing software that forms the basis of the trust we place in computers.

My trust in computers started to erode when Mark Abramson gave a lecture on the possible errors of floating point arithmetic. All programmers are aware of the problems possible by a division by zero or a narrowing conversion that loses information. However, other times the cause can be the futile attempt of a software developer to round a floating-point number. This was shocking: one of the most basic operations in math, a thing that we learn to do before we can ride a bike, eludes the combined efforts of the finest engineers over the last 30 years. To me, this was something intuitively nonsensical: if computers can’t consistently and reliability round numbers, how can we trust them to do anything?

This turns out to be one of those things like the halting problem that proves it is impossible for a computer to understand computer code. Floating points are represented internally in the IEEE-754 specification. Just like the impossible realization of a random number1, there is no such thing as a true fractional number in a computer. The key thing is to recognize that floating types do not represent numbers exactly. Internally the value is not a continuous range of numbers; instead it is represented as an exponent multiplied by an arithmetical series. For example:

$$
\frac{1}{2}^1 + \frac{1}{2}^2 + \frac{1}{2}^3 + \ldots + \frac{1}{2}^{30}
$$

From the above, you can see that in any range of floating point numbers there are gaps. Between any two floating point numbers there is a difference at the granularity of the smallest element in the arithmetical series (here $\frac{1}{2}^{30}$). If a number falls in such a gap, the difference between the real number is the approximation error. This leads to a fascinating and esoteric subject, that is best covered by others.

In grad school, my faith in computers only decreased when I realized the challenges associated with Eigenvalue problems, Singular value decomposition and LU/Cholesky/QR/… decompositions. If you want to understand how these are handled, a rough understanding of high performance computing libraries is necessary. These libraries are important because of the issues described above. It is very important (and very hard) to have both an efficient and correct algorithm for matrix decomposition.

I spent my teens coding in BASIC, my college years in MATLAB, and professionally I’ve used lots of Julia, FORTRAN, Ruby, Python, C, Haskell, and x86 or MIPS assembly. As I’ve used different languages, I kept coming back to familiar libraries that every language used with strange names like LAPACK, BLAS, FFTW, and ATLAS.

MATLAB will always be the language that I call home. It is probably the only language I was paid at work to write production code with. In building and running large models, I had to get into the internals of MATLAB optimization and learned that MATLAB was basically a wrapper around excellent matrix optimization libraries. As Cleve Moler writes, the basic motivation for MATLAB was to make LINPACK, EISPACK easy to use. Finally, on a New Year’s holiday, I had the chance to pull together a quick summary of the main math libraries I’ve come across: LAPACK, BLAS, ATLAS and FFTW.

LAPACK – Linear Algebra Package

LAPACK is the most common library for numerical linear algebra. I’ve used its routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK is written in FORTRAN and is powerful and popular since it handles both real and complex matrices in both single and double precision (subject to the caveats above).

LAPACK is the successor of EISPACK and LINPACK, both libraries for linear algebra algorithms, Developed in the early 70s (credits to Jack Dongarra, Jim Bunch, Cleve Moler, Pete Stewart). LINPACK is still used as benchmark for the most powerful supercomputers. The EISPACK and LINPACK software libraries were designed for early supercomputers (think the CDC-7600, Cyber 205, and Cray-1). These machines featured multiple functional units pipelined for increased performance. The CDC-7600 was basically a high-performance scalar computer, while the Cyber 205 and Cray-1 were early vector computers.2 One drawback to early “vector-based” architectures is the absence of optimized locality in data access. Consequently, EISPACK and LINPACK were subject to low performance on computers with deep memory hierarchies which became popular in the late 80s.

LAPACK was designed to specifically reimplement the algorithms as “block-based” to incorporate locality by Jim Demmel, Jack Dongarra and many others. (See a great history here.) Many computers have cache memory that is much faster than main memory; keeping matrix manipulations localized allows better usage of the cache. LAPACK was able to use this knowledge to run efficiently on shared memory and vector supercomputers. More recently, the ScaLAPACK software library extends the use of LAPACK to distributed memory concurrent supercomputers and all routines have been redesigned for distributed memory MIMD (multiple instruction, multiple data) parallel computers.

BLAS – Basic Linear Algebra Subroutines

LAPACK implemented on top of BLAS, a collection of low-level matrix and vector arithmetic operations. BLAS performs basic operations such as “multiply a vector by a scalar”, “multiply two matrices and add to a third matrix”. As a programmer, this is the type of thing I would definitely either get wrong or implement inefficiently. By contrast, LAPACK is a collection of higher-level linear algebra operations. LAPACK has routines for matrix factorizations (LU, LLt, QR, SVD, Schur, etc) that are used to answer more complex questions such as “find the eigenvalues of a matrix”, or potentially expensive operations like “find the singular values of a matrix”, or “solve a linear system”. LAPACK is built on top of the BLAS; many users of LAPACK only use the LAPACK interfaces and never need to be aware of the BLAS at all. LAPACK is generally compiled separately from the BLAS, and can use whatever highly-optimized BLAS implementation you have available.

In this sense, Basic Linear Algebra Subprograms (BLAS) is more of a specification than a specific software package. It prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. Due to the academic focus in this area, they are the de facto standard for low-level routines for linear algebra libraries. Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. Modern BLAS implementations take advantage of special floating point hardware such as vector registers or Single instruction, multiple data instructions.

BLAS is often divided into three main areas:

  • BLAS1: vector-vector operations (e.g., vector sum)
  • BLAS2: matrix-vector operations (e.g., matrix-vector product)
  • BLAS3: matrix-matrix operations (mainly matrix-matrix product)

ATLAS — Automatically Tuned Linear Algebra Software

ATLAS is a modern attempt to make a BLAS implementation with higher performance and more portability. As excellent as BLAS is, it has to be specifically compiled and optimized for different hardware. ATLAS is a portable and reasonably good implementation of the BLAS interfaces, that also implements a few of the most commonly used LAPACK operations. ATLAS defines many BLAS operations in terms of some core routines and then tries to automatically tailor the core routines to have good performance. For example, it performs a search to choose good block sizes that may depend on the computer’s cache size and architecture. It also accounts for differing implementations by providing tests to see if copying arrays and vectors improves performance.3

FFTW — Fastest Fourier Transform in the West (FFTW)

For an example of a much more specific library, FFTW is a software library for computing discrete Fourier transforms (DFTs).4 According to by regular benchmarks, FFTW is known as the fastest free software implementation of the Fast Fourier transform. It can compute transforms of real and complex-valued arrays of arbitrary size and dimension in $O(n \log n)$ time.

The magic of FFTW is to choose the optimal algorithm from a wide array of options. It works best on arrays of sizes with small prime factors, with powers of two being optimal and large primes being worst case (but still $O(n \log n)$). For example, to decompose transforms of composite sizes into smaller transforms, it chooses among several variants of the Cooley–Tukey FFT algorithm, while for prime sizes it uses either Rader’s or Bluestein’s FFT algorithm. Once the transform has been broken up into subtransforms of sufficiently small sizes, FFTW uses hard-coded unrolled FFTs for these small sizes that were produced by code generation. It goes meta.

For more detail from some real experts, check out Numerical Analysis: Historical Developments in the 20th Century by Authors C. Brezinski, L. Wuytack


  1. It is impossible to get a truly random number from a computer. Even if the sequence never repeats, which is not guaranteed for random numbers, another run of the program with the same inputs will produce the same results. So, someone else can reproduce your random numbers at a later time, which means it wasn’t really random at all. This causes all kinds of problems, but the opposite case would cause many more. 
  2. In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors, compared to scalar processors, whose instructions operate on single data items. 
  3. For example, it may be advantageous to copy arguments so that they are cache-line aligned so user-supplied routines can use SIMD instructions. Today, most commodity CPUs implement architectures that feature instructions for a form of vector processing on multiple (vectorized) data sets, typically known as SIMD (Single Instruction, Multiple Data). 
  4. Yay! FFTW was developed by Matteo Frigo and Steven G. Johnson at the Massachusetts Institute of Technology. 

Satisfiability modulo theories and their relevance to cyber-security

Cybersecurity and cryptoanalysis is a field filled with logic puzzles, math and numerical techniques. One of the most interesting technical areas I’ve worked goes by the name of satisfiability modulo theories (SMT) and their associated solvers. This post provides a layman’s introduction to SMT and its applications to computer science and the modern practice of learning how to break code hack.

Some background theory

Satisfiability modulo theories are a type of constraint-satisfaction problems that arise many places from software and hardware verification, to static program analysis and graph problems. They apply where logical formulas can describe a system’s states and their associated transformations. If you look under the hood for most tools used today for computer security, you will find they are based on mathematical logic as the calculus of computation. The most common constraint-satisfaction problem is propositional satisfiability (commonly called SAT) which aims to decide whether a formula composed of Boolean variables, formed using logical connectives, can be made true by choosing true/false values for its variables. In this sense, those familiar with Integer Programming will find a lot of similarities with SAT. SAT has been widely used in verification, artificial intelligence and many other areas.

As powerful as SAT problems are, what if instead of boolean constraints, we use arithmetic in a more general application to build our constraints? Often constraints are best described as linear relationships among integer or real variables. In order to understand and rigorously treat the sets involved domain theory is combined with propositional satisfiability to arrive at satisfiability modulo theory (SMT).

The satisfiability modulo theories problem is a decision problem for logical formulas with respect to combinations of background theories expressed in classical first-order logic with equality. An SMT solver decides the satisfiability of propositionally complex formulas in theories such as arithmetic and uninterpreted functions with equality. SMT solving has numerous applications in automated theorem proving, in hardware and software verification, and in scheduling and planning problems. SMT can be thought of as a form of the constraint satisfaction problem and thus a certain formalized approach to constraint programming.

The solvers developed under SMT have proven very useful in situations where linear constraints and other types of constraints are required with artificial intelligence and verification often presented as exemplars. An SMT solver can solve a SAT problem, but not vice-versa. SMT solvers draw on some of the most fundamental areas of computer science, as well as a century of symbolic logic. They combine the problem of Boolean satisfiability with domains (such as those studied in convex optimization and term-manipulating symbolic systems). Implementing SAT solvers requires an implementation of the decision problem, completeness and incompleteness of logical theories, and complexity theory.

The process of SMT solving is a procedure of finding an satisfying assignment for a quantifier-free formula for $F$ with predicates on a certain background theory $T$. Alternatively the SMT solving process could show such an assignment doesn’t exist. An assignment on all variables that satisfies these constraints is the model or $M$. $M$ will be satisified when $F$ evaluates to $\text{true}$ under a given background theory $T$. In this sense, $M$ entails $F$ under theory $T$ which is commonly expressed as: $ M \vDash_T F$. If theory $T$ is not decidable, then the underlying SMT problem is undecidable and no solver could exist. This means that given a conjunction of constraints in $T$, there must exist a procedure of finite steps that can test the existence of a satisfying assignment for these constraints.

Ok, that is a lot of jargon. What is this good for?

SMT solvers have been used since the 1970s, albeit in very specific contexts, most commonly theorem-proving cf ACL2 for some examples. More recently, SMT solvers have been helpful in test-case generation, model-based testing and static program analysis. In order to make this more real with a concrete example, let’s consider one of the most well-known operations research problems: job-shop scheduling.

If there are $n$ jobs to complete, each composed of $m$ tasks with different durations on $m$ machines. The start of a new task can be delayed indefinitely, but you can’t stop a task once it has started. For this problem, there are two constraints: precedence and resource constraints. Precedence specifies that one job has to happen before another and the resource constraint specifies that no two different tasks requiring the same machine are able to execute at the same time. If you are given a total maximum time $max$ and the duration of each task, the task is to decide if a schedule exists where the end time of every task is less than or equal to $max$ units of time. The duration of the $j$th task of job $i$ is $d_{i,j}$ and each task starts at $t_{i,j}$.

I’ve solved this problem before with heuristics such as Simulated_annealing, but you can encode the solution to this problem in SMT using the theory of linear arithmetic. First, you have to encode the precedence constraint:

$$ t_{i,j}+1 \geq t_{i,j} + d_{i,j} $$

This states that the start-time of task $j+1$ must be greater or equal to the start time of task $j$ plus its duration. The resource constraint ensures that jobs don’t overlap. Between job $i$ and job $i’$ this constraint says:

$$ (t_{i,j} \geq t_{i’,j}+d_{i’,j}) \vee (t_{i’,j} \geq t_{i,j} + d_{i,j}) $$

Lastly, each time must be non-negative, or $ t_{i,1} \geq 0 $ and the end time of the last task must be less than or equal to $max$ or $t_{i,m} + d_{i,m} \leq max$. Together, these constraints forms a logical formula that combines logical connectives (conjunctions, disjunction and negation) with atomic formulas in the form of linear arithmetic inequalities. This is the SMT formula and the solution is a mapping from the variables $t_{i,j}$ to values that make this formula $\text{true}$.

So how is this relevant to software security?

Since software uses logical formulas to describe program states and transformations between them, SMT has proven very useful to analyze, verify or test programs. In theory, if we tried every possible input to a computer program, and we could observe and understand every resultant behavior, we would know with certainty all possible vulnerabilities in a software program. The challenge of using formal methods to verify (exploit) software is to accomplish this certainty in a reasonable amount of time and this generally distills down to clever ways to reduce the state space.

For example, consider dynamic symbolic execution. In computational mathematics, algebraic or symbolic computation is a scientific area that refers to the study and development of algorithms and software for manipulating mathematical expressions and other mathematical objects. This is in contrast to scientific computing which is usually based on numerical computation with approximate floating point numbers, while symbolic computation emphasizes exact computation with expressions containing variables that are manipulated as symbols.

The software that performs symbolic calculations is called a computer algebra system. At the beginning of computer algebra, circa 1970, when common algorithms were translated into computer code, they turned out to be highly inefficient. This motivated the application of classical algebra in order to make it effective and to discover efficient algorithms. For example, Euclid’s algorithm had been known for centuries to compute polynomial greatest common divisors, but directly coding this algorithm turned out to be inefficient for polynomials over infinite fields.

Computer algebra is widely used to design the formulas that are used in numerical programs. It is also used for complete scientific computations, when purely numerical methods fail, like in public key cryptography or for some classes of non-linear problems.

To understand some of the challenges of symbolic computation, consider basic associative operations like addition and multiplication. The standard way to deal with associativity is to consider that addition and multiplication have an arbitrary number of operands, that is that $a + b + c$ is represented as $”+”(a, b, c)$. Thus $a + (b + c)$ and $(a + b) + c$ are both simplified to $”+”(a, b, c)$. However, what about subtraction, say $(a − b + c)$? The simplest solution is to rewrite systematically, so $(a + (-1)\cdot b + c)$. In other words, in the internal representation of the expressions, there is no subtraction nor division nor unary minus, outside the representation of the numbers. New Speak for mathematical operations!

A number of tools used in industry are based on symbolic execution. (cf CUTE, Klee, DART, etc). What these programs have in common is they collect explored program paths as formulas and use solvers to identify new test inputs with the potential to guide execution into new branches. SMT applies well for this problem, because instead of the random walks of fuzz testing, “white-box” fuzzing which combines symbolic execution with conventional fuzz testing exposes the interactions of the system under test. Of course, directed search can be much more efficient than random search.

However, as helpful as white-box testing is in finding subtle security-critical bugs, it doesn’t guarantee that programs are free of all the possible errors. This is where model checking helps out. Model checking seeks to automatically check for the absence of categories of errors. The fundamental idea is to explore all possible executions using a finite and sufficiently small abstraction of the program state space. I often think of this as pruning away the state spaces that don’t matter.

For example, consider the statement $a = a + 1$. The abstraction is essentially a relation between the current and new values of the boolean variable $b$. SMT solvers are used to compute the relation by proving theorems, as in:

$$ a == a_{old} \rightarrow a+1 \neq a_{old} $$ is equavalient to checking the unsatisfiability of the negation $ a == a_{old} \wedge a + 1 == a_{old} $.

The theorem says if the current value of $b$ is $\text{true}$, then after executing the statement $ a = a + 1$, the value of $b$ will be $\text{false}$. Now, if $b$ is $\text{false}$, then neither of these conjectures are valid:

$$
a \neq a_{old} \rightarrow a + 1 == a_{old}
$$
or
$$
a \neq a_{old} \rightarrow a + 1 \neq a_{old}
$$

In practice, each SMT solver will produce a model for the negation of the conjecture. In this sense, the model is a counter-example of the conjecture, and when the current value of $b$ is false, nothing can be said about its value after the execution of the statement. The end result of these three proof attempts is then used to replace the statement $a = a + 1$ by:

 if b then
   b = false;
 else
   b = *;
 end

A finite state model checker can now be used on the Boolean program and will establish that $b$ is always $\text{true}$ when control reaches this statement, verifying that calls to

lock()

are balanced with calls to

unlock()

in the original program.

do {
 lock ();
 old_count = count;
 request = GetNextRequest();
 if (request != NULL) {
  unlock();
  ProcessRequest(request);
  count = count + 1;
 }
}
while (old_count != count);
unlock();

becomes:

do {
 lock ();
 b = true;
 request = GetNextRequest();
 if (request != NULL) {
   unlock();
   ProcessRequest(request);
   if (b) b = false; else b = ∗;
 }
}
while (!b);
unlock();

Application to static analysis

Static analysis tools work the same way as white-box fuzzing or directed search while ensuring the feasibility of the program through, in turn, checking the feasibility of program paths. However, static analysis never requires actually running the program and can therefore analyze software libraries and utilities without instantiating all the details of their implementation. SMT applies to static analysis because they accurately capture the semantics of most basic operations used by mainstream programming languages. While this fits nicely for functional programming languages, this isn’t always a perfect fit languages such as Java, C#, and C/C++ which all use fixed-width bitvectors as representation for values of type int. In this case, the accurate theory for int is two-complements modular arithmetic. Assuming a bit-width of 32b, the maximal positive 32b integer is 231−1, and the smallest negative 32b integer is −231. If both low and high are 230, low + high evaluates to 231, which is treated as the negative number −231. The presumed assertion 0 ≤ mid < high does therefore not hold. Fortunately, several modern SMT solvers support the theory of “bit-vectors,” accurately capturing the semantics of modular arithmetic.

Let’s look at an example from a binary search algorithm:

int binary_search(
int[] arr, int low, int high, int key) {
 assert (low > high || 0 <= low < high);
 while (low <= high) {
 //Find middle value
 int mid = (low + high)/2;
 assert (0 <= mid < high); int val = arr[mid]; //Refine range if (key == val) return mid; if (val > key) low = mid+1;
   else high = mid–1;
 }
 return –1;
}

 

Summary

SMT solvers combine SAT reasoning with specialized theory solvers either to find a feasible solution to a set of constraints or to prove that no such solution exists. Linear programming (LP) solvers are designed to find feasible solutions that are optimal with respect to some optimization function. Both are powerful tools that can be incredibly helpful to solve hard and practical problems in computer science.

One of the applications I follow closely is symbolic-execution based analysis and testing. Perhaps the most famous commercial tool that uses dynamic symbolic execution (aka concolic testing) is the SAGE tool from Microsoft. The KLEE and S2E tools (both of which are open-source tools, and use the STP constraint solver) are widely used in many companies including HP Fortify, NVIDIA, and IBM. Increasingly these technologies are being used by many security companies and hackers alike to find security vulnerabilities.

 

 

Kids Lego table: Case study in Automation for Design

Motivation

I had to upgrade the Lego table I made when my kids were much smaller. It needed to be higher and include storage options. Since I’m short on time, I used several existing automation tools to both teach my daughter the power of programming and explore our decision space. The goals were to stay low-cost and make the table as functional as possible in the shortest time possible.

Lauren and I had fun drawing the new design in SketchUp. I then went to the Arlington TechShop and build the frame easily enough from a set of 2x4s. In order to be low-cost and quick, we decided to use the IKEA TROFAST storage bins. We were inspired from lots of designs online such as this one:

lego-table-example

However, the table I designed was much bigger and build with simple right angles and a nice dado angle bracket to hold the legs on.

table_with_bracket

The hard part was figuring out the right arrangement to place the bins underneath the table. Since my background is in optimization I was thinking about setting up two-dimensional knapsack problem but decided to do brute-force enumeration since the state-space was really small. I built two scripts: one in Python to numerate the state space and sort the results and one in JavaScript, or Extendscript, to automate Adobe Illustrator to give me a good way to visually considered the options. (Extendscript just looks like an old, ES3, version of Javascript to me.)

So what are the options?

There are two TROFAST bins I found online. One costs \$3 and the other \$2. Sweet. You can see their dimensions below.

options

They both are the same height, so we just need to determine how to make the row work. We could arrange each TROFAST bin on the short or long dimension so we have 4 different options for the two bins:

Small Side Long Side
Orange 20 30
Green 30 42

First, Lauren made a set of scale drawings of the designs she liked, which allowed us to think about options. Her top left drawing, ended up being our final design.

lauren designs

I liked her designs, but it got me thinking what would all feasible designs look like and we decided to tackle this since she is learning JavaScript.

Automation

If we ignore the depth and height, we then have only three options $[20,30,42]$ with the null option of $0$ length. With these lengths we can find the maximum number of bins if the max length is $112.4 \text{cm}$. Projects like this always have me wondering how to best combine automation with intuition. I’m skeptical of technology and aware that it can be a distraction and inhibit intuition. It would have been fun to cut out the options at scale or just to make sketches and we ended up doing those as well. Because I’m a recreational programmer, it was fairly straightforward to enumerate and explore feasible options and fun to show my daughter some programming concepts.

$$ \left\lfloor
\frac{112.4}{20}
\right\rfloor = 5 $$

So there are $4^5$ or $1,024$ total options from a Cartesian product. A brute force enumeration would be $O(n^3)$, but fortunately we have $\text{itertools.product}$ in python, so we can get all our possible options easily in one command:

itertools.product([0,20,30,42], repeat=5)

and we can restrict results to feasible combinations and even solutions that don’t waste more than 15 cm. To glue Python and Illustrator together, I use JSON to store the data which I can then open in Illustrator Extendscript and print out the feasible results.

results

Later, I added some colors for clarity and picked the two options I liked:

options

These both minimized the style of bins, were symmetric and used the space well. I took these designs forward into the final design. Now to build it.

final_design

Real Math

But, wait — wrote enumeration? Sorry, yes I didn’t have much time when we did this, but there are much better ways to do this. Here are two approaches:

Generating Functions

If your options are 20, 30, and 40, then what you do is compute the coefficients of the infinite series

$$(1 + x^{20} + x^{40} + x^{60} + …)(1 + x^{30} + x^{60} + x^{90} + …)(1 + x^{40} + x^{80} + x^{120} + …)$$

I always find it amazing that polynomials happen to have the right structure for the kind of enumeration we want to do: the powers of x keep track of our length requirement, and the coefficients count the number of ways to get a given length. When we multiply out the product above we get

$$1 + x^{20} + x^{30} + 2 x^{40} + x^{50} + 3 x^{60} + 2 x^{70} + 4 x^{80} + 3 x^{90} + 5 x^{100} + …$$

This polynomial lays out the answers we want “on a clothesline”. E.g., the last term tells us there are 5 configurations with length exactly 100. If we add up the coefficients above (or just plug in “x = 1”) we have 23 configurations with length less than 110.

If you also want to know what the configurations are, then you can put in labels: say $v$, $t$, and $f$ for twenty, thirty, and forty, respectively. A compact way to write $1 + x^20 + x^40 + x^60 + … is 1/(1 – x^20)$. The labelled version is $1/(1 – v x^20)$. Okay, so now we compute

$$1/((1 – v x^{20})(1 – t x^{30})(1 – f x^{40}))$$

truncating after the $x^{100}$ term. In Mathematica the command to do this is

Normal@Series[1/((1 - v x^20) (1 - t x^30) (1 - f x^40)), {x, 0, 100}]

with the result

$$1 + v x^{20} + t x^{30} + (f + v^2) x^{40} + t v x^{50} + (t^2 + f v + v^3) x^{60} + (f t + t v^2) x^{70} + (f^2 + t^2 v + f v^2 + v^4) x^{80} + (t^3 + f t v + t v^3) x^{90} + (f t^2 + f^2 v + t^2 v^2 + f v^3 + v^5) x^{100}$$

Not pretty, but when we look at the coefficient of $x^{100}$, for example, we see that the 5 configurations are ftt, ffv, ttvv, fvvv, and vvvvv.

Time to build it

Now it is time to figure out how to build this. I figured out I had to use $1/2$ inch plywood. Since I do woodworking in metric, this is a dimension of 0.472 in or 1.19888 cm.

 $31.95 / each Sande Plywood (Common: 1/2 in. x 4 ft. x 8 ft.; Actual: 0.472 in. x 48 in. x 96 in.) 

or at this link

So the dimensions of this are the side thickness $s$ and interior thickness $i$ with shelf thickness $k$. Each shelf is $k = 20-0.5 \times 2 \text{cm} = 19 \text{cm}$ wide. All together, we know:

$$w = 2\,s+5\,k+4\,i $$

and the board thickness is $t$ where $t < [s, i]$.

which gives us:

st width
s 1.20
i 3.75
k 19.00
w 112.40

Code

The code I used is below:

References

Some tax-time automation

I often struggle to find the right balance between automation and manual work. As it is tax time, and Chase bank only gives you 90 days of statements, I find myself every year going back through our statements to find any business expenses and do our overall financial review for the year. In the past I’ve played around with MS Money, Quicken, Mint and kept my own spreadsheets. Now, I just download the statements at the end of year and use acrobat to combine and ruby to massage the combined PDF into a spreadsheet.1

To do my analysis I need everything in a CSV format. After, getting one PDF, I end up looking at the structure of the document which looks like:

Earn points [truncated] and 1{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4} back per $1 spent on all other Visa Card purchases.

Date of Transaction Merchant Name or Transaction Description $ Amount
PAYMENTS AND OTHER CREDITS
01/23 -865.63
AUTOMATIC PAYMENT - THANK YOU

PURCHASES
12/27  AMAZON MKTPLACE PMTS AMZN.COM/BILL WA  15.98
12/29  NEW JERSEY E-ZPASS 888-288-6865 NJ  25.00
12/30  AMAZON MKTPLACE PMTS AMZN.COM/BILL WA  54.01

0000001 FIS33339 C 2 000 Y 9 26 15/01/26 Page 1 of 2

I realize that I want all lines that have a number like MM/DD followed by some spaces and a bunch of text, followed by a decimal number and some spaces. In regular expression syntax, that looks like:

/^(\d{2}\/\d{2})\s+(.*)\s+(\d+\.\d+)\s+$/

which is literally just a way of describing to the computer where my data are.

Through using Ruby, I can easily get my expenses as CSV:

Boom. Hope this helps some of you who might otherwise be doing a lot of typing. Also, if you want to combine PDFs on the command line, you can use PDFtk thus:

pdftk file1.pdf file2.pdf cat output -

  1. The manual download takes about 10 minutes. When I get some time, I’m up for the challenge of automating this eventually with my own screen scraper and web automation using some awesome combination Ruby and Capybara. I also use PDFtk to combine PDF files. 

A basic square root function in assembly

Why learn assembly?

It is painful, and nearly useless to know. But once you understand it, you now know how a computer “works”. It is impossible to understand the hardware/software interface without an knowledge of assembly. In particular, if you are in the computer security field, you have to understand how things really work. These days, I’ve developed a hobby of reverse engineering things and hacking around at the binary level. If you are a high level programmer (python, ruby), odds are high that you haven’t had to worry about memory management and pointers. If you have a c, FORTRAN or (mid-level) background, odds are high you haven’t had to worry about the stack or the different op-codes for system calls and interrupts, let alone which internal register things are stored in. However, these interactions are the exact thing you have to understand to “de-blackboxify” any computer. This makes a trip through assembly a necessary stop for everyone doing research in the security field.

The first thing for me is to get started with the most basic program possible. Since I’m just like everyone else in computer science these days, I use 64 bit OS X locally which is a flavor of the BSD operating system so I’m going to use nasm on my mac (brew install nasm) to assemble my code into an object file. Normally a compiler like gcc (or clang, etc) will turn c into assembly, and then object code, which then, in turn, is turned into the machine code specific to your system and processor. Think of object code as a numeric version of the assembly. It is an excellent exercise to translate assembly into opcodes, and is extremely complicated with lots of binary math to get the instruction sets right. If you want to play around with this, I recommend you check out this interactive page which converts the asm opcodes for you. I’ve met two people in my life who can actually code directly in opcodes, and once you nug through it once, it is mind-blowing that folks can do that much math in their heads.

Hello World in asm on the mac

global start

section .text

start:
    mov     rax, 0x2000004      ; write
    mov     rdi, 1              ; stdout
    lea     rsi, [rel msg]
    mov     rdx, msg.len
    syscall

    mov     rax, 0x2000001      ; exit
    mov     rdi, 0
    syscall


section .data

msg:    db      "Hello, world!", 10
.len:   equ     $ - msg

To run this, type:

/usr/local/bin/nasm -f macho64 hello.asm && ld -macosx_version_min 10.7.0 -lSystem -o hello hello.o && ./hello

The first command, nasm -f macho64 hello.asm, turns the code above into an object file. An object file is a file containing general machine code which are generally called opcodes. These opcodes are relocatable and usually not directly executable. There are various formats for object files, and the same object code can be packaged in different object files. It’s mostly machine code, but has info that allows a linker to see what symbols are in it as well as symbols it requires in order to work. (For reference, “symbols” are basically names of global objects, functions, etc.)

We can use hexdump to see actual opcodes.

0000000 cf fa ed fe 07 00 00 01 03 00 00 00 01 00 00 00
0000010 02 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00
...
0000190 00 73 74 61 72 74 00 6d 73 67 00 6d 73 67 2e 6c
00001a0 65 6e 00
00001a3

To get to the next step, we create our machine-specific executable with:

ld -macosx_version_min 10.7.0 -lSystem -o hello hello.o

Square roots for perfect squares with integer results

This works fine, now we want to make the world’s most simple square root function. In order to think things through, I wrote up a basic (and verbose) function in c to find the square root of 64, or of any number that is a perfect square with integer results.

#include 

int main()
{
    int len;
    int num;
    int sqrt;
    int lead_bytes;

    num = 64;
    lead_bytes = __builtin_clz(num);
    len = (32-lead_bytes);
        len = len/2;
    sqrt =  num >> len;
    printf("{aaa01f1184b23bc5204459599a780c2efd1a71f819cd2b338cab4b7a2f8e97d4}d",sqrt);
}

With this as a guide, I then did the laborious process of translating this into x86 64 assembly for the mac architecture:

And building via nasm and gcc:

➜ nasm -fmacho64 sqrt.asm
➜ gcc -v -o sqrt sqrt.o
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
 "/Library/Developer/CommandLineTools/usr/bin/ld" -demangle -dynamic -arch x86_64 -macosx_version_min 10.10.0 -o sqrt sqrt.o -lSystem /Library/Developer/CommandLineTools/usr/bin/../lib/clang/6.1.0/lib/darwin/libclang_rt.osx.a
➜ ./sqrt
8

Woot! In my mind I lose some leet points for using the default c-library, but it is still the most low-level hand-crafted code I’ve put together. For assembly, I’ve found that documentation is critical. Every implementation of every architecture is very specific. For example, mac requires _printf not printf which windows would require.

Now, if we want to make this interactive I’m going to make command line arguments possible, in addition to creating an external c library and linking to this.

Getting the files to run with command line arguments

Here knowledge of C translates over nicely. In C, main is just a plain old function, and it has a couple parameters of its own:

int main(int argc, char** argv)

If you follow the docs, you see that argc will end up in rdi, and argv (a pointer) will end up in rsi. Here is a program that uses this fact to simply echo the commandline arguments to a program, one per line:

EDIT: I found a more robust way that works.

Up next, running as a c-shared library (to run in Julia)

tbd . . .

Helpful Links

Provenance (in Computer Science)

Provenance is the ability to record the history of data and its place of origin. In general, it is the ability to determine the chronology of the ownership, custody or location of any object. The primary purpose of tracing the provenance of an object or entity is often to provide contextual and circumstantial evidence for its original production or discovery, by establishing, as far as practicable, its later history, especially the sequences of its formal ownership, custody, and places of storage. While originally limited to determining the heritage of works of art, the term now applies to wide range of fields, including archaeology, paleontology, archives, manuscripts, printed books, and science and computing. The latter is the context most relevant to my field of computer security.

In the context of data provenance, provenance documents the inputs, entities, systems, and processes that influence data of interest, in effect providing a historical record of the data and its origins. The generated evidence supports essential forensic activities such as data-dependency analysis, error/compromise detection and recovery, and auditing and compliance analysis, including the ability to detect advanced/persistent threats. Data provenance can provide a full historical record of data and its origins and the provenance of data which is generated by complex transformations such as workflows is of considerable value to scientists. From it, one can ascertain the quality of the data based on its ancestral data and derivations, track back sources of errors, allow automated re-enactment of derivations to update data, and provide attribution of data sources. Provenance is also essential to the business domain where it can be used to drill down to the source of data in a data warehouse, track the creation of intellectual property, and provide an audit trail for regulatory purposes.

The use of data provenance is proposed in distributed systems to trace records through a dataflow, replay the dataflow on a subset of its original inputs and debug data flows. In order to do so, one needs to keep track of the set of inputs to each operator, which were used to derive each of its outputs.

The w3c defines provenance as the ability to record a resource in order to describes entities and processes involved in producing and delivering or otherwise influencing that resource. Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance assertions are a form of contextual metadata and can themselves become important records with their own provenance.

Why do we care?

Because provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility and assertions of provenance can themselves become important records with their own provenance. The widespread use of workflow flow tools for processing scientific data facilitate for capturing provenance information. The workflow process describes all the steps involved in producing a given data set and, hence captures it provenance information. Provenance can be used to record metrics such as data creator/data publisher, data creation date, data modifier & modification date, or data description.

There are two major strands of provenance for computer science: Data Provenance and Workflow Provenance. Data provenance is fine-grain and is used to determine the integrity of data flows. It is a description of the origin of a piece of data and process by which it arrives in a database. By contrast workflow provenance is coarser in grain. It refers to records of history of the derivation of the final output of workflow and is typically used for complex processing tasks. Fine-grain provenance can further categorized into: where, how and why-Provenance. A query execution simply copy data elements from some source to some target database and where-provenance identifies these source elements where the data in the target is copied from. Why-provenance provides justification for the data elements appearing in the output and how-provenance describes some parts of the input influenced certain parts of the output.

References

wikipedia on data lineage
scale free networks
basic vector clock description