Friday, January 24, 2014

Evaluate performance bottleneck with perf

Perf tool is a profiler tool for Linux kernel 2.6+ that uses performance counter to profile the performance bottleneck of a program in both userspace and kernel space.

This tutorial is good example on how to use it.

To install it on ubuntu, sudo apt-get install linux-tools

A quick example on how to find the bottleneck of a program.


  1. Run the program as usual to collect performance record, for example "
    perf record iperf -c 192.168.1.1 -d", This command will ask perf to execute command "iperf -c 192.168.1.1 -d" and collect the performance number in "perf.data" file.
  2. View the performance record: perf record, this will show you how much time is spent in a userspace function or kernel function. In my example, it looks like this:
     31.95%  iperf  [kernel.kallsyms]        [k] md5_transform                                                                                                                        
     16.99%  iperf  [aesni_intel]            [k] _aesni_enc1                                                                                                                          
      3.14%  iperf  [kernel.kallsyms]        [k] do_csum                                                                                                                              
      2.50%  iperf  [kernel.kallsyms]        [k] memcpy                                                                                                                               
      2.16%  iperf  [kernel.kallsyms]        [k] __ticket_spin_lock                                                                                                                   
      1.78%  iperf  [aesni_intel]            [k] _aesni_dec4                                                                                                                          
      1.10%  iperf  [kernel.kallsyms]        [k] nf_iterate                                                                                                                           
      0.97%  iperf  [nf_conntrack]           [k] ____nf_conntrack_find                                                                                                                
      0.91%  iperf  [kernel.kallsyms]        [k] __slab_free                                                                                                                          
      0.81%  iperf  [kernel.kallsyms]        [k] skb_release_data                                                                                                                     
      0.80%  iperf  [kernel.kallsyms]        [k] fib_table_lookup                                                                                                                     
      0.80%  iperf  [kernel.kallsyms]        [k] memset                                                                                                                               
      0.80%  iperf  [kernel.kallsyms]        [k] __copy_user_nocache                                                                                                                  
      0.77%  iperf  [cxgb4]                  [k] process_responses                                                                                                                    
      0.77%  iperf  [ip_tables]              [k] ipt_do_table                                                                                                                         
      0.69%  iperf  [nf_conntrack]           [k] __nf_conntrack_find_get                                                                                                              
      0.68%  iperf  [kernel.kallsyms]        [k] md5_update                                                                                                                           
      0.67%  iperf  [nf_conntrack]           [k] nf_conntrack_in                                                                                                                      
      0.56%  iperf  [nf_conntrack]           [k] hash_conntrack_raw                                                                                                                   
      0.52%  iperf  [kernel.kallsyms]        [k] dst_release                         
    

    This is because I'm running tcp benchmark using iperf with ipsec turned on in ESP and AH mode, so a lot of cpu time is spent on md5(for checksum) and aes(for encryption). Now that we know where the performance bottlenecks are, it's time to spend time to fix them.