Sunday, June 17, 2012

Частота использования data mining пакетов

http://www.kdnuggets.com/2012/05/top-analytics-data-mining-big-data-software.html

This poll also had a very large number of participants and used email verification and other measures to remove unnatural votes (*see note below).


What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]
Legend: Free/Open Source tools 
Commercial tools
 % users in 2012 
 % users in 2011
R (245)30.7% 
 23.3%
Excel (238) 29.8% 
 21.8%
Rapid-I RapidMiner (213)  26.7% 
 27.7%
KNIME (174) 21.8% 
 12.1%
Weka / Pentaho (118)  14.8% 
 11.8%
StatSoft Statistica (112)  14.0% 
 8.5%
SAS (101) 12.7% 
 13.6%
Rapid-I RapidAnalytics (83)  10.4% 
not asked in 2011
MATLAB (80) 10.0% 
 7.2%
IBM SPSS Statistics (62)  7.8% 
 7.2%
IBM SPSS Modeler (54)  6.8% 
 8.3%
SAS Enterprise Miner (46) 5.8% 
 7.1%
Orange (42)  5.3% 
 1.3%
Microsoft SQL Server (40)  5.0% 
 4.9%
Other free analytics/data mining software (39) 4.9% 
 4.1%
TIBCO Spotfire / S+ / Miner (37)  4.6% 
 1.7%
Oracle Data Miner (35)  4.4% 
 0.7%
Tableau (35) 4.4% 
 2.6%
JMP (32)  4.0% 
 5.7%
Other commercial analytics/data mining software (32)  4.0% 
 3.2%
Mathematica (23) 2.9% 
 1.6%
Miner3D (19)  2.4% 
 1.3%
IBM Cognos (16)  2.0% 
not asked in 2011
Stata (15)  1.9% 
 0.8%
Bayesia (14)  1.8% 
 0.8%
KXEN (14) 1.8% 
 1.4%
Zementis (14)  1.8% 
 3.7%
C4.5/C5.0/See5 (13)  1.6% 
 1.9%
Revolution Computing (11) 1.4% 
 1.4%
Salford SPM/CART/MARS/TreeNet/RF (9)  1.1% 
 10.6%
Angoss (7)  0.9% 
 0.8%
SAP (including BusinessObjects/Sybase/Hana)(7) 0.9% 
not asked in 2011
XLSTAT (7)  0.9% 
 0.9%
RapidInsight/Veera (5) 0.6% 
not asked in 2011
11 Ants Analytics (4)  0.5% 
 5.6%
Teradata Miner (4) 0.5% 
not asked in 2011
Predixion Software (3)  0.4% 
 0.5%
WordStat (3) 0.4% 
 0.5%

Among tools with at least 10 users, the tools with the highest increase in "usage percent" were

  • Oracle Data Miner, 4.4% in from 2012, up from 0.7% in 2011, 505% increase
  • Orange, 5.3% from 1.3%, 315% increase
  • TIBCO Spotfire / S+ / Miner, 4.6% from 1.7%, 169% increase
  • Stata, 1.9% from 0.8%, 130% increase
  • Bayesia, 1.8% from 0.8%, 115% increase

The three tools with highest decrease in usage percent were 11 Ants Analytics, Salford SPM/CART/MARS/TreeNet/RF, and Zementis. Their dramatic decrease is probably due to vendors doing much less (or nothing) to encourage their users to vote in 2012 as compared to 2011.

Note: 3 tools received less than 3 votes and were not included in the above table: Clarabridge, Megaputer Polyanalyst/TextAnalyst, Grapheur/LIONsolver.

Big Data

Big data tools use grew 5-fold, from about 3% to about 15% of respondents.
Big Data software you used in the past 12 months
Apache Hadoop/Hbase/Pig/Hive (67) 8.4%
Amazon Web Services (AWS) (36) 4.5%
NoSQL databases (33) 4.1%
Other Big Data Data/Cloud analytics software (21)  2.6%
Other Hadoop-based tools (10)  1.3%

We also asked about the popularity of the individual languages for data mining. Note that we also included R in this table, as well as among higher-level tools

Your own code you used for analytics/data mining in the past 12 months in:
R (245) 30.7%
SQL (185) 23.2%
Java (138) 17.3%
Python (119) 14.9%
C/C++ (66) 8.3%
Other languages (57) 7.1%
Perl (37) 4.6%
Awk/Gawk/Shell (31) 3.9%
F# (5) 0.6%

For comparison here are the recent software polls:

Vote: cleaning: To reduce multiple voting this poll used email verification, which reduced the total number of votes compared to 2011, but made results more representative. 
Furthermore, some vendors were much more active than others in recruiting their users, and to give a more objective picture of the tool popularity, a large number (over 100) of the "unnatural" votes were removed, leaving 798 votes.

No comments:

Post a Comment