This poll also had a very large number of participants and used email verification and other measures to remove unnatural votes (*see note below).
What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters] | |
Legend: Free/Open Source tools Commercial tools | % users in 2012 % users in 2011 |
R (245) | 30.7% 23.3% |
Excel (238) | 29.8% 21.8% |
Rapid-I RapidMiner (213) | 26.7% 27.7% |
KNIME (174) | 21.8% 12.1% |
Weka / Pentaho (118) | 14.8% 11.8% |
StatSoft Statistica (112) | 14.0% 8.5% |
SAS (101) | 12.7% 13.6% |
Rapid-I RapidAnalytics (83) | 10.4% not asked in 2011 |
MATLAB (80) | 10.0% 7.2% |
IBM SPSS Statistics (62) | 7.8% 7.2% |
IBM SPSS Modeler (54) | 6.8% 8.3% |
SAS Enterprise Miner (46) | 5.8% 7.1% |
Orange (42) | 5.3% 1.3% |
Microsoft SQL Server (40) | 5.0% 4.9% |
Other free analytics/data mining software (39) | 4.9% 4.1% |
TIBCO Spotfire / S+ / Miner (37) | 4.6% 1.7% |
Oracle Data Miner (35) | 4.4% 0.7% |
Tableau (35) | 4.4% 2.6% |
JMP (32) | 4.0% 5.7% |
Other commercial analytics/data mining software (32) | 4.0% 3.2% |
Mathematica (23) | 2.9% 1.6% |
Miner3D (19) | 2.4% 1.3% |
IBM Cognos (16) | 2.0% not asked in 2011 |
Stata (15) | 1.9% 0.8% |
Bayesia (14) | 1.8% 0.8% |
KXEN (14) | 1.8% 1.4% |
Zementis (14) | 1.8% 3.7% |
C4.5/C5.0/See5 (13) | 1.6% 1.9% |
Revolution Computing (11) | 1.4% 1.4% |
Salford SPM/CART/MARS/TreeNet/RF (9) | 1.1% 10.6% |
Angoss (7) | 0.9% 0.8% |
SAP (including BusinessObjects/Sybase/Hana)(7) | 0.9% not asked in 2011 |
XLSTAT (7) | 0.9% 0.9% |
RapidInsight/Veera (5) | 0.6% not asked in 2011 |
11 Ants Analytics (4) | 0.5% 5.6% |
Teradata Miner (4) | 0.5% not asked in 2011 |
Predixion Software (3) | 0.4% 0.5% |
WordStat (3) | 0.4% 0.5% |
Among tools with at least 10 users, the tools with the highest increase in "usage percent" were
- Oracle Data Miner, 4.4% in from 2012, up from 0.7% in 2011, 505% increase
- Orange, 5.3% from 1.3%, 315% increase
- TIBCO Spotfire / S+ / Miner, 4.6% from 1.7%, 169% increase
- Stata, 1.9% from 0.8%, 130% increase
- Bayesia, 1.8% from 0.8%, 115% increase
The three tools with highest decrease in usage percent were 11 Ants Analytics, Salford SPM/CART/MARS/TreeNet/RF, and Zementis. Their dramatic decrease is probably due to vendors doing much less (or nothing) to encourage their users to vote in 2012 as compared to 2011.
Note: 3 tools received less than 3 votes and were not included in the above table: Clarabridge, Megaputer Polyanalyst/TextAnalyst, Grapheur/LIONsolver.
Big Data
Big data tools use grew 5-fold, from about 3% to about 15% of respondents.Big Data software you used in the past 12 months | |
Apache Hadoop/Hbase/Pig/Hive (67) | 8.4% |
Amazon Web Services (AWS) (36) | 4.5% |
NoSQL databases (33) | 4.1% |
Other Big Data Data/Cloud analytics software (21) | 2.6% |
Other Hadoop-based tools (10) | 1.3% |
We also asked about the popularity of the individual languages for data mining. Note that we also included R in this table, as well as among higher-level tools
Your own code you used for analytics/data mining in the past 12 months in: | |
R (245) | 30.7% |
SQL (185) | 23.2% |
Java (138) | 17.3% |
Python (119) | 14.9% |
C/C++ (66) | 8.3% |
Other languages (57) | 7.1% |
Perl (37) | 4.6% |
Awk/Gawk/Shell (31) | 3.9% |
F# (5) | 0.6% |
For comparison here are the recent software polls:
- KDnuggets 2011 Poll: Data Mining/Analytic Tools Used
- KDnuggets 2010 Poll: Data Mining / Analytic Tools Used
- KDnuggets 2009 Poll: Data Mining Tools Used.
Vote: cleaning: To reduce multiple voting this poll used email verification, which reduced the total number of votes compared to 2011, but made results more representative.
Furthermore, some vendors were much more active than others in recruiting their users, and to give a more objective picture of the tool popularity, a large number (over 100) of the "unnatural" votes were removed, leaving 798 votes.
No comments:
Post a Comment