What do you want to see at Data Day Texas?
As part of the speaker selection process for Data Day Texas, we've been asking all the folks we know in the Austin data community what and who they would like to see. Below are some of the responses we've received so far.
Responses for December 18-19
My primary interest is machine learning. I'd be particularly interested in hearing from people doing things the like kaggle competitions. A workshop where we actually tackled one of the problems like that would be great fun.
What about talks on storm? stream processing? I want to play with those but haven't had time to really dive into them.
What about giraph for graphs?
I'm interested in high volume streaming data capture, storage, and analytics. Basically high end log capture and analysis. How to capture in real time, get it accessible quickly, and then do analysis and visualization on huge amounts of stored data. That's what I'm into right now, but I know there's something to learn from the experts on this.
HBase is an extremely popular technology. HBase vs. Cassandra, would actually make a great topic for the Austin Data Party smackdown. For data day, the schema modeling is not all that different from Cassandra, so it would probably be better to do a NoSQL Schema modeling panel where Cassandra, NoSQL and Mongo are all represented. As a seperate talk, I think you'd want to focus on someone who can talk about implementation, configuration and navigating around steady-state surprises －－the issues that tend to pop up when you've actually deployed applications using the technology and are running those applications in production. This is something that also is not broadly known, regardless of which particular Big Data technology it is.
Strata (and others like it) tend to focus on data and the "enterprise" whereas I think Data Day should be more about 2 sort-of tent poles: (1) interesting data use cases and (2) applied solutions to data problems.
More specifically, topics for (1) could be things that focus on
interesting data sets and tools that can be used to surface non-obvious indicators
"getting started with data analysis" approaches
how to unearth information from typical data that organizations collect (e.g. log data, etc.)
For (2) I see this more as what tools companies/individuals are using to solve data problems:
management tools to deploy data infrastructure
pipeline management for end to end data analysis (not just Map Reduce but getting your data from your apps all the way to getting useful results)
new software that solves existing data-centric pain points
new tools to work with the big data stack (things like eclipse plug-ins to work with Elastic Map Reduce, etc.)
I would love to hear some talks about business intelligence. Anything relating to tools, frameworks, packages, etc.
My favorite talks are when a presenter just runs through a problem and the problems / successes they encountered along the way.
I believe HBase and Hadoop are both of major interest. A tutorial of HBase could also target Hadoop considering they are co-installed and tightly coupled. The goal of working around more professional Analytics examples could enrich the event. Overall there is much focus nowadays towards Analytics, in different angles and Hadoop has become a de-facto on both core and alternative analytics.
Another area of interest nowadays is "Stream Computing", which involves the avoidance of storing large amounts of data, while identifying the important data patterns that comes from data streams. This is also an interesting association to analytics and BigData. Overall I would see today the ned to not only have systems that can efficiently store and consult data, but also capture needed aspects and discard redundant ones prior to storing them.
A talk by Nathan Marz would be great. I would LOVE to know more of the implementation details behind the twitter architecture - especially the use of Kafka and Voldemort in the "real time data", and the delineation of "eventual accuracy".
Also, it would be awesome to have someone from Titan to talk about graph database and querying, as well as the mapping to Cassandra.
Speaking of Cassandra, the DataStax integration with Solr/Lucene would be a really useful topic. I really want to understand the extend of the use of Cassandra as an index store, and conversely, how indexing of Cassandra data works.