Friday, January 24, 2014

Evaluate performance bottleneck with perf

Perf tool is a profiler tool for Linux kernel 2.6+ that uses performance counter to profile the performance bottleneck of a program in both userspace and kernel space.

This tutorial is good example on how to use it.

To install it on ubuntu, sudo apt-get install linux-tools

A quick example on how to find the bottleneck of a program.


  1. Run the program as usual to collect performance record, for example "
    perf record iperf -c 192.168.1.1 -d", This command will ask perf to execute command "iperf -c 192.168.1.1 -d" and collect the performance number in "perf.data" file.
  2. View the performance record: perf record, this will show you how much time is spent in a userspace function or kernel function. In my example, it looks like this:
     31.95%  iperf  [kernel.kallsyms]        [k] md5_transform                                                                                                                        
     16.99%  iperf  [aesni_intel]            [k] _aesni_enc1                                                                                                                          
      3.14%  iperf  [kernel.kallsyms]        [k] do_csum                                                                                                                              
      2.50%  iperf  [kernel.kallsyms]        [k] memcpy                                                                                                                               
      2.16%  iperf  [kernel.kallsyms]        [k] __ticket_spin_lock                                                                                                                   
      1.78%  iperf  [aesni_intel]            [k] _aesni_dec4                                                                                                                          
      1.10%  iperf  [kernel.kallsyms]        [k] nf_iterate                                                                                                                           
      0.97%  iperf  [nf_conntrack]           [k] ____nf_conntrack_find                                                                                                                
      0.91%  iperf  [kernel.kallsyms]        [k] __slab_free                                                                                                                          
      0.81%  iperf  [kernel.kallsyms]        [k] skb_release_data                                                                                                                     
      0.80%  iperf  [kernel.kallsyms]        [k] fib_table_lookup                                                                                                                     
      0.80%  iperf  [kernel.kallsyms]        [k] memset                                                                                                                               
      0.80%  iperf  [kernel.kallsyms]        [k] __copy_user_nocache                                                                                                                  
      0.77%  iperf  [cxgb4]                  [k] process_responses                                                                                                                    
      0.77%  iperf  [ip_tables]              [k] ipt_do_table                                                                                                                         
      0.69%  iperf  [nf_conntrack]           [k] __nf_conntrack_find_get                                                                                                              
      0.68%  iperf  [kernel.kallsyms]        [k] md5_update                                                                                                                           
      0.67%  iperf  [nf_conntrack]           [k] nf_conntrack_in                                                                                                                      
      0.56%  iperf  [nf_conntrack]           [k] hash_conntrack_raw                                                                                                                   
      0.52%  iperf  [kernel.kallsyms]        [k] dst_release                         
    

    This is because I'm running tcp benchmark using iperf with ipsec turned on in ESP and AH mode, so a lot of cpu time is spent on md5(for checksum) and aes(for encryption). Now that we know where the performance bottlenecks are, it's time to spend time to fix them.

No comments:

Post a Comment