Military Libraries Workshop, Von Braun Center, Huntsville, Alabama
December 11, 2013
Big Data
- Data is the new oil – we have to learn how to mine it! Qatar – European Commission Report
- $ 7 trillion economic value in 7 US sectors alone
- $90 B annually in sensitive devices
- Land, Labor, Capital, + Data
Data Deluge – the End of Science, Wired, 16.07
Too much
data to analyze and process!
Google, eBay, LinkedIn, and Facebook are all Big Data harvesters,
they were expecting Big Data from the beginning.
They don’t need to reconcile or integrate Big Data with
their IT infrastructure because they were built to deal with it.
Traditional sources of data and the analytics performed upon
them aren’t going away. Big Data is the
new member of the family that must be integrated. Data scientists have to learn to work with
the data and be able to analyze it.
Big Data is too much stuff to deal with in a reasonable
amount of time!
Big Data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big Data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. – Wikipedia, May 2011
There is a new paradigm – one of data-intensive scientific discovery
There are new special collections – more about methods than data.
- Location aware data
- Life streaming
- Insurance claims
- Hubble telescope
- CERN Collections
- Flight data
- Means untagged or unformatted
- Word files
- File shares
- News feeds
- News Data feeds
- Images
This isn’t entirely accurate. We make use of the properties of PDF and Word
files, we can add a lot of metadata and give the files structure. Only most people don’t do this.
Structured data is like xml – the tagging describes the
data.
What are the problems?
- Data infrastructure challenges
- “taking diverse and heterogeneous data sets and making them more homogeneous and usable”
- Is this a problem or an opportunity?
- All that data – what can it tell us?
- Privacy
- Copyright
- Neurological impact
- Data collection methods
Government Initiative
Big Data Senior Steering Group
(BDSSG) was formed to identify current Big Data research and development
activities across the Federal government, offer opportunities for coordination,
and identify what the goal of a national initiative in this area would look
like.
There is a fast-growing volume of
digital data. Do we need new technology?
Techniques for dealing with Big Data
Content organization – doesn’t matter where the data lives
(machine, cloud, etc.)
Undifferentiated, unstructured – needs organization.
Type of database structure:
where are we going to put it? Do
we use a relational database or an object-oriented system?
An object-oriented system using java or xml pulls all the
descriptors into one place – the object.
Example of a bottle of water – the descriptors would all live with the
object – (water, bottle, plastic, origin, etc.)
What are Librarians doing?
- We are using meta-search tools to integrate all these data sets.
- We give structure to the unstructured data
- We create the meta-data
Where do store the meta-data?
- With the records - in the html header
- Store the meta-data in a separate file and link to it – database or Sharepoint
such amazing content
ReplyDeleteLibrarians are doing great things like meta-search tools to integrate , give structure to the unstructured data and create the meta-data.
ReplyDeleteproduct feed management
The process of education is as old as the man himself. The process has undergone various changes and modern education is linked with write my college essay me. The assignment writing is the most important tool of learning.
ReplyDelete