Results of Multiformat at 128kbps public audio Listening Test

Since Roberto's last multi-codec listening test, there has been significant work involved with LAME and Vorbis codec's.  Not much has changed with iTunes codec however and the Atrac3 codec is new to this test as used by Sony's Connect online music store.

Vorbis
aoTuV showed significant improvement over the standard vorbis codec used in the last public listening test and came tied with MPC in first place in this test.  This is the first time MPC is no longer the leader in a multi-codec public listening test.  LAME also showed significant improvement over the last listening test and came closely tied with iTunes in second place.  WMA standard which is used by most online music stores came in third place.  Finally Sony's Atrac3 codec came last. 

Note that the Vorbis codec tested here is one of the many code branches intended to give better quality over the standard release by Xiph.org.  A previous listening test compared the various branches with the winner being Aoyumi's aoTuV and thus has been used in this listening test.  iTunes 4.2 has been used for this test due to quality concerns raised with iTunes 4.5 and similar said for the chosen LAME version and command line parameters.  The quality settings for Vorbis and MPC have been tweaked to give an average of close to 128kbps over a wide range of albums and styles.


Overall Ratings (Zoomed)

How to interpret the plots:  Each plot is drawn with the five codecs on the x axis and the ratings given (1.0 through 5.0) on the y axis. N is the number of listeners used to compute the means (average ratings) and 95% confidence intervals. The mean rating given to each codec is indicted by the middle point of each vertical line segment, and the value is printed next to it. Each vertical line segment represents the 95% confidence interval (using ANOVA analysis) for each codec.

This analysis is different than the one used on ff123's 64kbps test . The difference is mainly one about risk. The ANOVA / Fisher LSD method is more at risk for falsely identifying differences between codecs. On the other hand, it's more sensitive than the Tukey HSD.

One codec can be said to rated better than another codec with 95% confidence if the bottom of its line segment is at or above the top of the competing codec's line segment. For example, in the chanchan plot below, Lame is rated better than Atrac3 with 95% confidence. And iTunes is rated better than Lame with greater than 95% confidence.

Important note:  These plots represent group preferences (for the particular group of people who participated in the test). Individual preferences will vary somewhat. The best codec for a person is dependent on his own preferences and the type of music he prefers.

Some other important notes:

  • The Vorbis version tested here isn't the standard one offered by Xiph.org. Some vorbis enthusiasts, frustrated with Xiph's slow release schedule, decided to take matters on their hands and create code branches with better quality tunings. A listening test comparing branched versions to the standard Xiph version was conduced by Vorbis enthusiasts, and the winner was Aoyumi's aoTuV. That's why this unusual version is being featured here.
  • iTunes is exactly the same as QuickTime in "Better" quality mode (which should give same results as "Best" on 16bit material). Both use the same encoding routines.
  • iTunes is being tested at version 4.2 instead of 4.5, because of quality concerns raised by listeners
  • The custom Lame command line was chosen in a similar fashion - in a listening test conduced by enthusiasts.
  • The unusual quality settings for MPC and Vorbis were chosen after testing several qualities over a wide range of albums and styles, and picking the setting that generated results closer to 128kbps.

Overall Ratings

The results for each sample were grouped together, without modifications.

Then I performed an ANOVA analysis. Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.

Vorbis aoTuV is tied to Musepack at first place, Lame MP3 is tied to iTunes AAC at second place, WMA Standard is in third place and Atrac3 gets last place.

This test showed some very interesting developments since the last Multiformat at 128kbps test. Lame seems to have improved a lot, getting tied to it's technological successor AAC. Vorbis got much better, thanks to the independent tunings performed by Aoyumi. And Atrac3 surprised by it's bad performance.

See the full results for this listening test here.

Just when I thought that MP3 encoders were pretty much tweaked to the best the MP3 codec is capable of, the LAME setup here with this listening test clearly shows that the MP3 codec can perform much better than most had believed; coming tied with with iTunes and better than WMA which is widely used for music stores due to its DRM capabilities.  This means that MP3 Flash based player lovers will be able to take advantage of higher quality encodings without sacrificing capacity to higher bit rates or upgrading to more expensive players supporting higher quality codec's. 

Feel free to discuss about audio codec's and relating software and hardware on our Audio Forum.

Source: Roberto's public listening tests

No posts to display