Az előadás letöltése folymat van. Kérjük, várjon

Az előadás letöltése folymat van. Kérjük, várjon

A Big Data – és ami alatta van Sepp Norbert IBM Magyarország.

Hasonló előadás


Az előadások a következő témára: "A Big Data – és ami alatta van Sepp Norbert IBM Magyarország."— Előadás másolata:

1 A Big Data – és ami alatta van Sepp Norbert IBM Magyarország

2 Marketing Prioritás Mátrix
1 Adatrobbanás Felkészületlenek aránya 2 Közösségi média Eszközök és csatornák növekvő választéka 1 3 70 Változó ügyféldemográfia 2 4 3 5 Pénzügyi korlátok 4 Csökkenő márkahűség 6 60 5 Növekedő piacok megcélzása 7 6 Megtérülési szempontok 8 10 7 8 9 11 Ügyfelekkel való együttműködés 9 50 12 10 Személyes adatok védelme We asked respondents to rank each factor in terms of its expected impact on the marketing function over the next three to five years. Then we looked at how this related to the level of confidence they feel about managing each factor. Some of the changes CMOs are least prepared to manage are those likely to cause the biggest upheavals. We found that the top items – 1 through 4, which CMOs felt least prepared to manage, were, in fact, the factors they thought would impact their business the most: 1) Data explosion – CMOs have been dealing with this for a while already, which probably explains why it wasn’t further to the right on the x axis, but it is still the factor that concerns CMOs the most. 2) Social media – CMOs feel somewhat more prepared to manage social media – though their level of anxiety is still high, and it is further along the X suggesting “more impact,” because it is relatively new and still evolving. CMOs are unclear how social media is going to change things specific to their organization – but they anticipate the change will be significant 3) Growth of channel and device choices – the onslaught of tablets and increased use of mobile, among other things, is very quickly becoming a priority for CMOs – and like social media – it is still not clear how all of this will manifest itself. Most CMOs said this would have the most impact on their marketing organizations, but they feel a bit more prepared to deal with it than the data explosion or social media 4) Shifting consumer demographics – like #1, “data explosion” – CMOs have been aware of these predictions for a while now, so they have already been anticipating the impact, but they still are feeling significantly underprepared to deal with the implications, in part because so many of the factors we just discussed are being driven by the demands of changing demographics that include new global markets and the influx of younger generations entering the marketplace who will drive a lot of the changes noted in 1 through 3. The other items – 5 through 13 are also important, but either CMOs feel they are better prepared to manage them, or don’t anticipate their impact to be as powerful as the top 4. 13 11 Globális kiszervezés 12 Befolyásoló tényezők Megfelelés 40 13 Átláthatóság 20 40 60 Átlag

3 Mi lenne, ha tudnánk a választ a hasonló kérdésekre?
Melyik ügyfél akar elpártolni? Melyik tranzakció utal visszaélésre? Mely termék lesz a legsikeresebb? Hogy nyerek ki értelmet az adattengerből? {DESCRIPTION} Which customers are thinking of leaving? Which transactions are fraudulent? Which new product has the greatest chance of success? How can I extract insight from all of my information? The ultimate differentiator today… …is being able to make more informed choices with confidence, to anticipate and shape business outcomes. {TRANSCRIPT} So, you can start posing these sorts of questions with the client, what happens, what would you be able to do if you actually had answers to those problems? What are the customers that you want to keep within your client base? What are the ones that actually or maybe ones that you want to keep and if they threaten to leave maybe you just let them go? How do you decide between those two? If you’re managing financial transactions, how do you determine which transactions are, most subject to fraud, how do you basically build products that are really going to hit the sweet spot with market price demands? It’s all about better understanding and really what we are starting to see is that, that sort of understanding is really going to differentiate those companies who are truly successful in the future from those who are really just (inaudible) in the business. A különbség… …hogy képesek lehetünk megalapozott döntést hozni – az eredmények előrejelzéséhez és alakításához 3

4 Bevétel növekedési ráta Profit növekedési ütem
Analitika - Business Analytics & Optimisation (BAO) Analitika: hardver, szoftver, rendszerek, megoldások, szolgáltatások kombinációja, amely lehetővé teszi a szervezeteknek, hogy az adattangerből információt nyerjenek ki, s ebből új üzleti lehetőségeket találjanak … az analitikába fektető cégek rendre megelőzik konkurenseiket Bevétel növekedési ráta 5 év CAGR ( ) Profit növekedési ütem 5 év CAGR ( ) Megtérülés 5 év átlag ( ) {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} So, this is pretty standard definitions and Business Analytics is really around getting some insight into the way that business is run and the way organizations perform, the way products are created and getting to know your clients better. A whole bunch of using analytical techniques to basically analyze the wealth of information that is out there. I think that one of the key things that is often forgotten is that is the analytics but the optimization and doing something with that information or that insight that we’ve gained. It’s all very well understanding the problem better but then how do you then optimize the way you interact with the client, optimize your product, optimize the way that you run your business to make an improvement. One of the things that we see and this is dated from the Institute of Business Value CFO study from 2010,, companies investing in Analytics consistently out perform those that don’t. 33% >12x 32% 12.5% 11.9% 9.4% 7.3% 9.0% 0.6% 4

5 Minőségi és mennyiségi sorok Területi és idősorok
Statisztikai elemzés Minőségi és mennyiségi sorok Területi és idősorok Teljes körű és reprezentatív minta Viszonyszámok, gyakorisági sorok, koncentráció, sztochasztikus kapcsolat, asszociáció, korreláció Középértékek (számított: számtani, mértani, harmonikus, négyzetes / helyzeti: módusz, medián) Statisztikai következtetés (valószínűségi alapokon) becslés (egy mintából következtetünk meg nem mért értékre – konfidencia-intervallum, megbízhatósági szint) Hipotézisvizsgálat v. próba (első- és másodfajú hiba, szignifikanciaszint, kritikus érték

6 Leginkább aktuális analitikai témák
“Big Data” Variety, Velocity, Volume, Veracity “Nagy” és “kis” adat – sok anomália, érdekes minták. A “kis” adat hagyományos megközelítése mellett kifinomult módszerek kellenek a “nagy” adatok kezeléséhez Közösségi média analitika Pontos, rugalmas, gyors, reagálni kell az elérhető adatokban megfogalmazódott felhasználói igényekre Valós idejű adatok Az adatok kinyerése, átalakítás, mozgatása, feldolgozása már nem kötegelet, háttérben zajló folyamat, hanem azonnali tevékenység {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} As we said, Analytics is very much a hot topic in it’s own right. It’s on the agenda of a lot of “C” level executives. However, even within Analytics we are already starting to see other trends starting to emerge that you may come across or hear about. The first of those very obviously is something that’s called “Big Data.” Now, we can all debate on how big is big but the idea of being able to do analytics on volumes of data that would typically be much larger than those we could process using traditional means. You will often hear reference to the 3 “V’s”, Variety, Velocity, and Volume. Lots of different sources of data. Data arriving at a faster and faster rate and again just the pure scale and size here. Very often we see the Big Data scene being used synonymously with Analytics in a lot of context. A couple of other ones which are very much hot topics at this point in time are Social Media Analytics. They are grossing Social Media and the growth in Social Networks has generated a huge volume of additional data that very often individuals and companies want to query and mine and basically use to gain greater insight, whether it’s about what their clients really think about their products given what people might be watching on the blogs or might be Tweeting about them. This might then in turn allow them to gain greater insight into what they could do for their next product, so you can see this sort of Social Media Analytics being used to influence product design or influence pricing or the approach by which the product is being sold. Last but not least we will spend a little bit of time speaking about Real-time data. This sort of feeds off the velocity comment that we had in the Big Data section but generally speaking there are numbers of organizations who are increasingly wanting to process data in effect as it’s arrived at the organization to make a decision very, very rapidly, potentially in a matter of seconds, sub-seconds, millisecond timeframes. Real-time data Analytics is of a lot of interest to a lot of organizations who are wanting to operate at that sort of speed.

7 Haladó analitika területei
Statisztikai Miért történik? Miről maradunk le? Érzelmek elemzése (Sentiment analysis ) Mi a véleményük rólunk? Hogy viszonyulnak hozzánk? Tervezés, előrejelzés Mit fogunk tenni? Mik a trendek? Mennyi kell és mikor? Prediktív modellezés Mi a következő lépés? Mi lesz ennek az üzleti hatása? Szimuláció Mi lenne, ha...? Mik az alternatívák? Optimalizálás Hogyan javíthatunk ezen? Mi a legjobb döntés? Tartalomelemzés A meglévő információból még mit lehet kitalálni? Identity Insight Személyek és dolgok, közöttük lévő kapcsolatok megismerése {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} Even within the advanced and predictive Analytic space, very much analytics which is more angled at looking forward instead of backward, you might come across any number of these sorts of terms. I’m not going to go into any of these here in particular detail so just to basically so say that things like: Content analytics which is sort of the Analytics that fits behind Watson and is very interesting in terms of making a lot more insight of data you’ve already got. Identity Insight, another key growing area of interest that sort of hangs off the back of the Social Media Analytics we were talking to before in terms of being able to identify individuals by the way they behave and the way they interact. So let’s consider a user whose number of different Social Media sites and posting comments under a range of different user ID’s or identities, we might be able to use Identity Insight to determine who that individual is and then be able to do something about it. You can see a range of those sorts of advanced and predictive analytics techniques that are also forming a key part of this ground swell of interests around analytics that is going on at the moment.

8 Analitika a gyakorlatban
8 8 Analitika a gyakorlatban Hogy érjük el a legjobb eredményt a véletlent is figyelembe véve? Sztochasztikus optimalizálás Előíró Hogy érjük el a legjobb eredményt? Optimalizálás Prediktív modellezés Mi lesz akkor, ha... ? Előrejelző Szimuláció Mi történhet … ? Előrejelzés Hova vezetnek a trendek? Riasztás Mit kell tenni? Lekérdezés/lefúrás Mi a probléma pontosan? Ad hoc Jelentések Mennyi, hol, milyen gyakran? Rendszeres Jelentések Mi történt? Leíró Komplexitás

9 Big Data analitikai olló
Rendelkezésre álló adat... Elszalasztott lehetőség … és amennyit fel tudunk dolgozni A rendelkezésre álló adatok egyre kisebb részét tudjuk feldolgozni A vállalatok az új lehetőségek erdejében vakon tapogatóznak

10 Big Analytics 1010

11 Probléma meghatározása Cselekvés, monitorozás, tanulás
Az analitikai folyamat Adatgyűjtés Elemzés Döntés Probléma meghatározása Cselekvés, monitorozás, tanulás Bemenet + Elemzés + Döntés + Eredmények = Új adat! Define Problem Act, Monitor, Learn Decide Analyse Gather data Define Problem Act, Monitor, Learn Decide Analyse Gather data {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} One of the other things I think, and this comes back to the very siloed nature of Analytics in the web of a lot of organizations treat it is this is not something that you really do on your own. It’s not a solo sport. If you look at the top, you sort of see a typical sort of loop of activities. We need to understand the problem, we gather some data, we do some analysis, we make some decisions, and then we do the optimization. We sort of act upon those insights that we’ve gained and then we feel like we’ve closed that loop because some of the learnings that we might have got out help us to redefine, re-refine or even ask a different question about a different problem we are trying to solve. For different parts of the business, these sorts of decision loops, Analytical decision loops, are all interconnected. Data from one organization or insight gained from one part of the business will have implications for other parts of the business. The outputs from one set of Analytics become the inputs to a different set of Analytics. You are generating new data all the time when you start doing this and the sorts of decisions and optimizations then cascade through other parts of the organization and having a very siloed approach to analytics very often misses the point here. Define Problem Act, Monitor, Learn Decide Analyse Gather data Define Problem Act, Monitor, Learn Decide Analyse Gather data ... Döntéseink új döntések bemenő adatai 11 11

12 IBM BAO referencia architektúra
Content Management Master Data Management Data Integration Data Repositories BI / Performance Monitoring Advanced Analytics Access Sources Enterprise Apps Unstructured Data Stores Informational External Web Structured Data Stores Devices Master / Reference Data Document Management Services Federation Ingestion Base Services Records Extraction Reference Data Management Operational Orchestration Components Data Load Components CRUD Transactional Operational Data Store Data Warehouse Time Persistent Repository Dimensional Layer Master Data Store Content Store Staging Area Reporting Planning, Forecasting, Budgeting Scorecards Guided Analysis Dashboards Querying Monitoring Simulation Optimization Visualization Predictive Analytics Data Mining Text Analytics BATCH Extract / Subscribe REALTIME Web / Services Data Quality Portal Device Transform Composite Application Load / Publish Collaborative Application TRANSACTIONAL Productivity Application {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Here is our Reference Architecture or rather the framework, which is what it really is – it’s own architecture. As you can see, right down the very bottom, infrastructure is this tiny little strip or layer at the bottom, and people would often say: Well, actually, infrastructure obviously can’t important because it is only that one line. Enterprise Search Business Unit Application Business Process Management Service Management Information Governance Collaboration Security, Privacy & Compliance Transport & Delivery Infrastructure

13 IBM BAO Referencia Architektúra
BI / Performance Monitoring Content Management Master Data Management Data Integration Data Repositories Advanced Analytics Sources Access Business Process Management Service Management {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} I am going to slightly twist this on it’s head now and say, well actually, if you look at a more simplified view of this, I have taken the various pillars of the architecture along the top. I’ve expanded out those horizontal layers at the bottom, it may be perceived from that previous chart that all of the things that we typically do within STG, processes, memory and networking and storage; they all fit into that single thread or strand at the bottom. Information Governance Collaboration Security, Privacy & Compliance Transport & Delivery Infrastructure CPU MEMORY DISK NETWORK

14 IBM BAO Referencia Architektúra
BI / Performance Monitoring Content Management Master Data Management Data Integration Data Repositories Advanced Analytics Sources Access CPU NETWORK MEMORY DISK CPU NETWORK DISK DISK DISK MEMORY MEMORY Business Process Management {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Whereas, actually, if you look at the various pillars, you can start to see that there are different infrastructure requirements in those pillars. Let’s say we look at data integration in the center of the screen. Data integration is very much a processor and memory intensive activity. The requirements of processing devices and the memory requirements and memory footprints within that space will be much higher than if you are doing content management or managing repositories of data which is really more about storage. Actually, the infrastructure requirements within each of those pillars are, themselves, very different. Just drawing it as a single line and saying, you can build an infrastructure that will support all of your analytics needs, is somewhat misleading. Transporting and deliver is all about networking; it’s about latency; it is about our qualities of service; it’s about security and the like. A single view, a single strand as the Reference Architecture Framework portrays it, actually doesn’t do the infrastructure story justice here. What we really want to do, is have a slightly different conversation with the client which is really where we are going within the rest of this presentation. Service Management Information Governance Collaboration Security, Privacy & Compliance NETWORK Transport & Delivery Infrastructure

15 Az analitikai folyamat tipikus lépései
Adat Betáplálás Integrálás Analízis Értelmezés Eredmény A Big Data elemzésnél további szempontok {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} If you think about what we are trying to do when we do analytics, it really is about this. What happens is we have some data? We first have to ingest that data into the system – we have to move it from where it is today into the environment, the infrastructure where we are going to run some analytics. Chances are, we are then going to need to integrate the data that we brought in, the data that we ingested, with some other data sources; whether that is master data, whether that is with sources of data from other environments or other systems, whether that is data which has come from outside the organization. There is some sort of data integration process needs to occur. We then do some analysis and actually process using the mathematical model, the algorithms, the tooling that we have, we will analyze the data that has been ingested and integrated. Finally, we get to the interpretation of the results and the delivery of the results to the business. Pretty much any application or analytics workload can be broken down into these steps. What we are going to do is we are going to look at those steps in turn, and understand how infrastructure has a role to play on the success or failure of those. Gyorsan kell az eredmény Sokféle adat Nagy mennyiségű adat Pontos eredmény kell Releváns válasz Megfizethető megoldás Sokak által használható megoldás

16 Az adat maga is kihívásokat teremt...
Variety Volume Velocity Veracity Value {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} Data alone introduces a number of challenges. You will see that the three V’s, the variety, volume and velocity, from the traditional Big Data story. The other two, veracity and value are often added in. Veracity comes from the 2012 global technology outlook which starts to talk about the management of uncertainty in data, and how you will, in the future, be making decisions on data which is far less certain or has far less providence than the data we might be familiar with doing today and talks about some of the techniques that they think are likely to be applied to handle this volume of uncertainty. Veracity is all about the truthfulness of the data. Again, some interesting challenges arise in how are you going to compensate for that uncertainty and potentially one solution to that is to use a lot more computational horsepower or grunt in the analysis phase to account for that uncertainty and to run more simulations or to run more predictions to try and better refine the answer that you get or account for that uncertainty. Or it could be that you are going to have to use a lot more processing and memory and larger volumes of data to put that piece of information into a broader context. Obviously, value is something that is very important going forward. There are a number of challenges that we can start to see coming along that are introduced by those five V’s. We have three I’s and one A – I did try and get a word beginning with I that means analysis so I could have four I’s to go with five V’s, but I haven’t yet managed to find an appropriate word to put into that workflow that we are going to do. Let’s look at this flow of data through an analytical system and look at what actually happens.

17 Az analitika üzleti és műszaki szempontokat is felvet
{DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} One of the interesting things that jumps out from this is - this is really diagrammatic in the actual weightings of this. One of the things that jumps out quite early one, and I realized this when I started looking at these sorts of problems, is that you have two very different ways of looking at this problem. You can look at it from a technical perspective or you can look at it from a business perspective. Very often, when we engage with our clients around analytical solutions, we are talking to business people. We very much have a business focus on our engagements. Typically, this needs to be balanced out with the technical focus. For example, you end up with a situation that, if you think about the ingest problem, this is all about moving data from where it sits today into the analytics environment. This is fundamentally a technology problem to solve. It will take you a finite amount of time to move any given volume of data from point A to point B. If you do not have an understanding of the technology, and I am currently working with a client in Southern Europe to basically look at this sort of problem, it is physically impossible to move the volume of data that they need to move through it in the time that they have allowed with the infrastructure solution that they have today. We are running up into the speed of light and other fundamental physical constraints that will stop us doing these things. You see, it is a very, very technical focus around things like ingest. Integration, again, is another one of these areas where it is predominantly a technology-led problem to solve around creating large properly parallelized virtualized – big memory – environments within which we can manipulate data in an interesting fashion. Analysis, probably computationally intense, but probably has much less focus on disk and other infrastructure components. By the time we get to interpretation, apart from using some technology to help us better visualize the large volume of data to draw some pictures or grafts, or charts, or diagrams to help someone better interpret the data, really at that end of the spectrum, there is very little technology involvement in that interpretation stage. Look at it from the business perspective, interpretation is very much a business focus. What does this result mean to me in the context of my business and what do I need to do as a result of this? This is very much a business focused way of looking at this problem. Similarly, going back the other way, what are the questions that I am really trying to answer? It is very much more of a business focus than potentially a technology focus, and by the time you get down to things like how am I going to move the data; actually the business probably really doesn’t care or know about what goes on at that stage. You see these two different focuses. There is the focus on the technology; very important up front that gradually trails off in importance over this flow through the system. And then an increase in focus on the business aspects as we get closer to the analysis and interpretation. So the relative importance of the business side and the technical side changes as we go through this analytics process. The challenge is if you only talk to business people, very often a lot of the technology constraints that you run into early in the ingest and integration phase, get lost or forgotten or are ignored. If you talk to just the technology people, you may want to have a great technology solution up front, but actually one that then fails to deliver what the business actually requires. It is important to be talking to both the technology people about the infrastructure and capabilities as well as the business people about the analytics and the software in order to build the solution that the client is going to need. Az analitika üzleti és műszaki szempontokat is felvet Variety Volume Velocity Veracity Value Betáplálás Integrálás Analízis Értelmezés Műszaki fókusz Megoldás Üzleti fókusz

18 A betáplálás korlátait fizikai törvények határozzák meg
Feladat: az adatot a felhasználás helyére mozgatni Variety Volume Velocity Veracity Value Betáplálás Integrálás Analízis Értelmezés Sebesség: - Hálózat - Tároló - Memória - CPU Párhuzamosítás ? Helyben használat ? {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Let’s look at the steps of the analytics process in a little bit more detail, and talk about some of the issues that might surface here. As I said before, ingest is largely governed by physics. This is really a technology discussion that we are going to have. It is all about really moving data to where it needs to be. It is about the throughput of networks, it’s the throughput of memory, it’s the throughput of disks and the like. It’s about potentially seeing whether you can exploit parallelism to move things more quickly. It’s about can we use the data in situ without having to move it perhaps; but obviously, doing that has then potential implications on the operational characteristics of the system, where the data is coming from. So this is really about the physics fundamentally. Infrastructure discussions and infrastructure sizing and performance and those functional requirements are going to be key in the success of that ingest phase.

19 Példa: Nagy adatmennyiség mozgatása
Médium Komponens Rendszer 1GB 10GB 100GB 1TB 500TB 1PB 10PB Diszk 100MB/s 4GB/s 0,25s 2,5s 25s 4m 35h 69h 29d Hálózat 1Gb/s 125 MB/s 8s 80s 800s 133m 46d 93d 926d 10 Gb/s 1250 MB/s 0,8s 13m 5d 9d Szalag 250 MB/s 4s 40s 400s 67m 23d 463d {DESCRIPTION} This slide contains a table that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Just to give you some examples here, these are classical approaches to moving data. Moving data using disk-type of technologies or tape-type technologies or networking technologies. As you can see, when you get up to not even some quite large volumes of data; so if you were looking at the GB network in the middle, moving 500 TB of data, which is not an insignificant amount, but again is actually quite small when some of these big data challenges are being considered will take you 46 days. That’s a month and a half in order to move that data using traditional methods, which says that, if you only have a single 1GB network, even if everything works perfectly, it is going to take you days and weeks and, in fact a month and a half, in order to move that volume of data. So, we need to factor in those sorts of considerations when it comes to designing solutions able to ingest not only the volumes of data that clients are wanting to manipulate and work with today, but again, looking forward to the larger volumes of data and the more timely delivery that we are going to see required of us in the future. ... a klasszikus módszerek napokat, heteket jelentenek

20 Integrálás Cél: több helyről összeszedni, kombinálni (és megtisztítani) az adatokat Variety Volume Velocity Veracity Value Betáplálás Integrálás Analízis Értelmezés Mennyiség és sebesség: - Memória - Tároló Számítási teljesítmény Párhuzamosítás? {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Integration, as I said before, is very much a memory and CPU based thing. We can do a lot of work around parallelism because most of these workloads tend to parallelize very nicely whether that is in a scale-out approach across let’s say a grid, or a farm of servers, or whether that is a scale-up approach with using lots and lots of cores and lots and lots of threads within a larger symmetric multi processor. But integration is really around throughput of memory, lots of compute and these sorts of workloads. Again, you can see how that plays through into the timeliness, the relevance, the cost effectiveness and so on and so forth.

21 Az elemzés a komoly munka
Variety Volume Velocity Veracity Value Betáplálás Integrálás Analízis Értelmezés {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} Analysis is the step that does the hard work, although, as we might have seen, actually quite a lot of the infrastructure focus is on those early stages. The physics of moving the data and the large memory footprint, large computational footprint systems that actually do a lot of this complicated manipulation and integration of data. Analysis is really about computation when it really comes down to it. It is about things like the throughput and volume of memory. You are using cubing services, let’s say from Cognos TM 1 to be able to analyze a large volume of data in memory, (so you construct the cube inside memory), then obviously, you need a large amount of memory to hold that environment, and then you need to design a system that no only can provide you with that memory, but can provide it with the sorts of availability, security, and reliability characteristics that you need if you are going to be basing critical business decisions on the results of this analysis. It may be that we can, again, exploit parallelism and in particular, if you look at the financial risk workloads, these are farmed out over vast server farms with many tens, or hundreds of thousands of cores in order to process these risk workloads in a cost effective fashion. But also, what we start to see, is we start to see the use of acceleration technologies whether they are computational accelerators like NVIDIA GPUs which, again, use to accelerate computation. In some environments, it might be that they are using a PGA as was used in the Netezza appliance. But the use of acceleration technologies where we need to get even more computation into this analytic space in a very short space of time. Számítási teljesítmény Memória mennyisége és átbocsátóképessége Párhuzamosítás Speciális „gyorsítók”

22 Példa: Hadoop ajánlott hardverkörnyezet
{DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} One of the interesting things that might come out from this, and here is an interesting comparison. If you were to go out to an industry source, so I’ve taken the O’Reilly Handbook for Hadoop, but you can get this sort of information from lots of different places on the web, and say, what is the platform that is best for running a Hadoop workload? So, Hadoop is an open source analytic method built around a mathematical framework or map reduce, and Hadoop provides a set of tooling to allow you to analyze a lot of these big data type problems. So you go to the O’Reilly Handbook or you go onto the web and they will tell you that actually, a couple of core to core processors, a little bit of memory, bunch of disks, and that is the platform of choice for running Hadoop workloads. Compare that with our BigInsights Hardware Foundation. BigInsights is IBM’s Hadoop offering today. So again, one of the things that jumps out from here – 2 CPUs. Okay that is fairly standard, 2 socket motherboard. We are using 6 core CPUs and you can immediately say, Okay there is a little bit of difference – a little bit more compute power within the IBM offering. A lot more memory – twice as much memories. So 48GB memory versus 24. You could maybe make a case that the bit of extra compute power needs a bit more memory, but actually we have quite a substantially larger memory footprint relative to the amount of computation increase. The big difference here is the number of disks. So, 12 disks rather than 4. What we have with our reference configuration is a 1-1 mapping between processor cores and physical disk spindles. Compare that with the 8 cores and 4 disks, in the industry standards, this is where you start to see some of that IBM Insight, some of that bigger systems viewpoint approach coming to the design of systems to support these analytics workloads where we can build upon our experiences and actually say, this storage rich configuration is actually a much better platform for running the Hadoop BigInsight type workloads than the one which is actually recommended by the majority of the rest of the internet as you would go and see it. So we are now starting to see, even for this compute phase, a loss of focus on a storage-rich environment rather than a storage-poor one suggested elsewhere.

23 Értelmezés: használhatóvá tenni az eredményt...
Variety Volume Velocity Veracity Value Betáplálás Integrálás Analízis Értelmezés Vizualizáció Grafikus képességek Számítási teljesítmény {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Last but not least, we start talking about interpretation. As I mentioned before, there is actually not that much which value the technology really plays here. This is very much the character of the business focus. I will basically make a play for some sort of capability to better visualize data. Human beings are very good at analyzing pictures much better than analyzing tables, and tables, and tables full of data, so very often, we would use graphical representations of data (pictures, diagrams, charts, whatever they happen to be) to help us poor humans basically manage the huge volumes of data that are now going to be thrown at us and even more so in the future. So, again, from a technology point of view, from the infrastructure point of view, provision and visualization and the graphical capabilities may be a bit of computational capability to help with that visualization is really all that technology is bringing this far end of the analysis process.

24 Példa: IBM Deep Thunder
{DESCRIPTION} This slide contains two screen captures that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} A couple of examples here. This is using IBM’s Deep Thunder which is our sort of local thunder storm prediction engine and analytical weather forecasting platform, and again, you can start to see pictures of the maps and the rain fall put onto a geographic and representation of an area. You can see and predict where it is going to be raining and when and all of those things. So, very much more easier than, let’s say, being presented with hundreds of screens full of data in rows and columns that you would otherwise have to look at and interpret in order to understand what this thing was actually telling you. Never under estimate the power of graphics and the power of the representation of data to be able to convey the message across going forward. But really, by the time you get to the interpretation phase, there really is not that much of a roll for technology and infrastructure at this point in time.

25 Az elemzés nem egy különálló sziget...
Újra felhasználás / Finomítás Használjuk majd még - hozzáférési idő - gyakoriság - teljes vagy részhalmaz Megtartás Megtartás T Betáplálás Integrálás Analízis Értelmezés Megtartás {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} The last thing to remember is, really, analytics as we have said, is not something you do standalone. Let’s just think about – we’ve done this process. You can see the ingest, integrate, analyze and interpret flow sitting within the bracketed box on the screen. That is going to generate some information. It is going to generate some date. It is going to generate some results. What are we going to do? Is this something that we are going to keep and basically feed back into the process and reuse it and refine it? Is it something that we have to keep where we know we are going to access it again in the future? Is it something that we are going to keep where we might have to access it but we are not sure, but we are going to keep in on the off chance? Is it something we are just going to throw away? The decision has been made and the job is done, and we no longer need to keep the data. So, rather than thinking again about analytics just as this standalone environment, which has a role to play, we need to view it as part of this total data lifecycle and total data management problem that’s in organization phases. I read an interesting article a couple weeks back now where an analyst was basically saying, actually this is all about hoarding data. And big data is just because you are not very good at managing it. You don’t throw it away, you don’t clean it, and all we are doing really is encouraging enterprises to hoard information. Well, maybe, but actually you could get some very interesting insights from that potentially, a few years down the line, but having that data and having that ability to analyze it is going to be increasingly important going forward into the future. Eldobás Talán használjuk még - Mire? - Mely részét? ... hanem a teljes adat-életciklus része

26 Példa Általános felhasználó Kiemelt felhasználó Részlegek adattárai
Operatív adatok BI Server Hagyományos (operatív) adatforrások Jelentés Általános felhasználó Műszerfal Streaming Analytics 'Cubing Services' Riasztások Adattárház Virtual Sandboxes Integrálás Új adatforrások {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Here is a typical environment. This is made-up example. We have an environment where this client has a data warehouse. They have some information integration capability. They might have some streaming analytics. They have a Hadoop cluster, a BI Server, some cubing capabilities. So there are lots of different uses of analytics that we can see within this environment. In-memory sandbox Hadoop cluster Standalone sandbox Kiemelt felhasználó

27 Az egyes analitikai feladatok más-más követelményeket támasztanak
Cores Network SCM Storage Prediktív analitika, modellezés, szimuláció Cores Network SCM Storage Optimalizálás Érzékenység- elemzés Cores Network SCM Storage {DESCRIPTION} This slide contains a graphic that is covered by the narration written in the transcription of this slide. {TRANSCRIPT} Here are some examples that come from the 2012 TO, again, showing how, if you look at a different analytics workloads, how their requirements of processing cores, network, storage and SCM (Storage Class Memory) as used here, how, let’s say a predictive analytics solution very much a processor network heavy. Some of them doing text analytics of Hadoop – very much storage and memory heavy. Think back to the example that we saw before of the IBM BigInsights Reference Configuration and now you see why it has a lot of storage, a lot of memory compared to throughout the industry standard configuration, because as we understand these workloads more and more, we can see the different system characteristics that each workload starts to bring. There is no one size fits all, and any hardware solution to support analytics needs to accept the fact that these different workloads have different requirements and a one size fits all solution is going to give you problems at some point in time later on. Szövegelemzés, Hadoop

28 Hardver: Masszív párhuzamosság, tartós memória
Számításközpontú modell Adatközpontú modell Manycore FPGA Párhuzamoság Tartós memória bemenet kimenet Flash Phase Change Adat szalagon és diszken Adatmozgatás a processzorhoz Sokszintű tároló-hierarchia Adat a tartós memóriában Sok processzor veszi körül Lapos tároló-hierarchia A feladat hat a hardverválasztásra, a rendszerszoftverre, az alkalmazásokra

29 Storage Class Memory SCM – 2015 $0.05 / GB $50K / PB
(Phase Change) FLASH SCM – 2015 $0.05 / GB $50K / PB $0.10 / GB Relatív ktg Relatív késleltetés DRAM 100 1 SCM 10 FLASH 15 1000 HDD 0.1 100000 $0.01 / GB Source: Chung Lam, IBM HDD költségelőny, 1/10 SCM SCM sebességelőny, 10,000x HDD

30 Analitikai megoldások megvalósulási formái
{DESCRIPTION} Software as a Service (SaaS) Analytics appliances Open Source Analytics Pre-built analytic applications vs. Traditional models {TRANSCRIPT} As if things weren’t complicated enough with all these different hot topics within Analytics, we are also starting to see some other interesting challenges that start to appear. There are a number of organizations that after if you like Analytics as a service, a Cloud based software as a service platform. Some organizations are very much looking at Cloud and Big Data or Cloud and Analytics as being just different sides of the same coin. Now, the challenge with a lot of these off premise solutions is that it really depends on how much data you are going to be storing, how quickly you need to be able to access or upload it, because very often a lot of the commercial construct in which Cloud tends to be based tend to break down when we start to deal with large volumes of data or data that needs to be stored for a very long period of time. I’d like to refer you to the 2011 Global Technology Outlook that was created by IBM Research and one of the chapters there which talks about Petascale Analytics, Analytics on Petabytes of data that has a very interesting example of comparing that on premise and off premise approach and very often it shows you for those very large Petascale type analytics tasks, it is economically much more sensible to run it on premise then off. That doesn’t mean to say that there isn’t a role for Cloud to play in Analytics. Very often if you are thinking about Social Media Analytics, where a lot of the data, let’s say a feed from Twitter, a feed from Facebook, a feed from LinkedIn, a feed from any other social network can all be consolidated on a single site and offered to be made available through an Analytics as a service offering. All of the data you are going to be mining potentially may be sitting in the Cloud already and is being accessed by lots and lots of different organizations, so those cost arguments in that Social Media Analytic space might not necessarily hold true as they would in the previous example. I think that software as a service and analytic solutions have a role to play but it very much comes down to how much data, stored for how long, where is it stored, who can access it, and all of the traditional Cloud security concerns needing to be address before an organization will want to do it. We ourselves have analytics appliances, so the Netezza devices and Analytic appliances and we are seeing a great many other organizations getting onto the band wagon producing specialized hardware devices to solve specific analytic problems. And one of the great things about an appliance is that it’s great at doing one thing. One of the bad thing about appliance is that it’s very good at doing one thing. So if you try and do something that it wasn’t necessarily designed or intended for then very often you find that the advantages of the appliance, the easy to use and the get going quickly and manage it simply start to break down when you start pushing it outside the area in which it was designed for. A great many of other organizations are looking at analytics from Open Source environments or they are looking at the analytics that are pre-built in a lot of existing software. So let’s say you bought a database or you bought an system that probably has some ability to run quite complex queries or do some quite simple analytics on those databases, again, is that positioned to what the clients are trying to do or not? Very often we see a range of these different alternative deployment models as being something that when we engage with clients around these analytics we have to be aware that they exist and aware that the client made decide to bring some of these up and we should have a view or an understanding of how we are going to position some of these new deployment models against the traditional, on-premised, built as a single monolithic system or as a single enterprise wide infrastructure approach we might traditionally take. Analitikai megoldások megvalósulási formái Software as a Service (SaaS) - felhő Analitikai célmegoldások (PureData platform, BigInsight) Open Source Analitikai megoldások Analitikai szoftvercsomagok (SPSS, Cognos...) vs. Hagyományos rendszerek

31 ... az infrastruktúra támogassa a változás lehetőségét
A trendek... Strukturálatlan adatok mennyisége rohamosan nő Új elemzési lehetőségek és problémák Új vállalati igények Egységes, jól felhaszálható, elérhető analitika: Költséghatékonyság Felhasználóbarát, könnyen hozzáférhető megoldások Skálázhatóság, optimalizált környezet Analitika a domináns IT feladat, a hardverválasztást befolyásolja A Tera katergória és Exa (Zetta) kategória közötti folytonos átmenet Adatközpont-tervezés, fenntarthatóság {DESCRIPTION} This slide contains the topics that are covered by the narration written in the transcription of this slide. {TRANSCRIPT} If we look forward, the sort of trends that we are starting to see, we are starting to see a lot of focus on unstructured data. You could argue that actually a lot of data still has structure. An Mpeg file still has some fundamental structure within it, but actually, the contents of the video frames or the contents of the audio segments within an MP3 let’s say, don’t have necessary structure, but the analysis of unstructured data opens up a whole range of new possibilities to gain better insight. And as we saw before, what does that then do in terms of needing to add storage acceleration devices in order to take on the analysis of some of these more interesting workloads. There is a lot of focus at the moment on skills for analytics and where those skills are going to come from, so actually we need to be able to build platforms which are consumable first and foremost, by a large number of business uses. Because today, where relatively few users have access to analytic systems, that is going to fundamentally change in the future where potentially everyone in a business may have access in one way shape or form to analytics capabilities. As we start to see the analytics become an important IT workload, and if you don’t see this happening, just think to the 2015 roadmap and see the size of the contribution to IBM’s growth that business analytics and optimization brings, this will become a key IT workload of the future, and in turn, it will then factor into the way that we design hardware, the way that we design infrastructure, so that we can build systems that which if you like, are designed from the ground up in order to support these sorts of workloads. Things aren’t going to happen that will make things change, so any infrastructure that is going to support analytics needs to factor change in as a key part of its original capabilities. ... az infrastruktúra támogassa a változás lehetőségét

32 IBM Watson “a jövő analitikája” - már ma...
{DESCRIPTION} This slide shows a picture of the TV show Joepardy. Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of information in real-time? {TRANSCRIPT} Now, there’s a lot of interest generated off the back of Watson and I’m not going to spend anymore than this single slide talking about Watson. In terms of a proof point that we can use with clients, very much in terms of showing them the art of possible but showing them the art of possible today, Watson is an extremely powerful story. Obviously, a large Power7 based infrastructure running some specialized software took on two champion players of the US game show Jeopardy and won. There are a variety of clips out there on how the games went and also you know there have been a variety of other clips that have since been shown on a variety of other news shows and TV programs showcasing Watsons’ capability. One of the most recently interesting ones that we’ve seen was from a UK show called, “The Gadget Show” where two presenters went to the Watson research center in NY State and basically played against the Watson computer and ended up losing badly but the most interesting thing was when they seemed to be doing well, it transpires that the IBM engineers behind the system had basically turned down it’s sensitivity allowing to play at a much easier level. These sorts of subtle nuances, this vision of the future that Watson provides us in terms of just what might be possible with Analytics is really something that we ought to leverage more than we currently do today. As you probably are aware, we are starting to take the Watson concept and take the Watson System and start applying that to other industry problems like healthcare to finance and working with a number of clients in those industries to basically explore just what the art of the possible in this sort of content analytic space really may be. Can we design a computing system that rivals a human’s ability to answer questions posed in natural language, interpreting meaning and context and retrieving, analyzing and understanding vast amounts of information in real-time?


Letölteni ppt "A Big Data – és ami alatta van Sepp Norbert IBM Magyarország."

Hasonló előadás


Google Hirdetések