About the talk
The DATAcated Conference is a free, virtual ‘data party’ hosted by Kate Strachnyi. This is the third DATAcated Conference – it has an industry focus and covers financial services, healthcare, energy, retail, sports, and food & beverage.
Weather of Kota, she's a data scientist at hitco, and she's going to talk to us about Renewed Energy, innovating with data and machine learning in energy. The volume and variety of data are particular challenges. In the energy industry will share real-world, use cases for AI in oil and gas and Renewables. Going to tell us how Keith address these challenges. So how do I practical approaches into a life cycle? From beta scalability to predictive insights set. The study statistics and machine learning is Carnegie Mellon
University and her analytics experience in other Industries to stand natural language processing and biostatistics. All right, without further Ado. I'm going to go ahead and bring sweat the up on our virtual teacher. Hello. Welcome to the dedicated conference and thank you for joining us. I can't thank you so much for having me. I'm going to go ahead and remove myself from the stage and then let you take this away. Can you see my screen up? Yep, we can see the screen. Okay, awesome. Yeah. So again, I'm very excited to kick off the energy section of this conference. And today, specifically
I want to show and tell the story of a data science project in renewable energy. And today. When are you working with data from the electrical reliability, Council of Texas and the National Oceanic and Atmospheric Administration and we're going to bring together into to visualizations in complex, analytics kicking off the energy section of this conference. I didn't want to give an overview of the industry itself. So oil and gas development are really at the Forefront of the energy industry volume and variety of data that needs to be constantly, assess and monitor Dan, oil and
gas poses challenges every stage of the development. So, this ranges from predicting, where the girl tracking real-time Drilling and then monitoring Wallace, once they're created. So you can imagine without a doubt that having analytics in a itools hopes address many of these date of challenges. In all together, that's really where a typical comes into play. So we've had sex with success working with customers in the energy industry because of her hyper-converged products like to put a to virtualization to do data science and Spotfire. And again my goal today is to really show how we've used
some of these products to create a complete analytics project as well as a highly interactive dashboard. So without further Ado, I'm just going to go ahead and jump into a renewable energy demo. So this demo is going to be is going to Advocate date data from several sources. It's going to highlight visuals and time series and geospatial data, and it's going to use time series in machine learning model to predict future energy or power production. Now a little bit about the full story at hand. So infrastructure teams, ranging from local Farm owners
to energy, companies need to be able to track her and farm conditions, and predict future power cord. So here, we're looking at the last 10 years of weather, Dayton Texas. Now research shows that the performance about solar panels and wind turbines depends, collectively on weather conditions, geographic information and Equipment information. So I see the scientist. Our job would be early is to use this Collective information to best predict, future energy or power production. And as always start off by exploring or data at hand, so it's not
really that surprising that many of our weather conditions or whether variables have statistically significant seasonal differences from here. We can see that wind speed has to be higher in the spring compared to other months and this is just one of the differences. We hope that data science models can pick up on And in addition, I also mentioned that one of the challenges in the energy industry is the variety of data as well as the messiness of data that were working with. So the actual wind farms that we worked with didn't have weather information at their specific coordinates. So we just
want to spot fires readily available data functions. In order to interpolate conditions, from a wind farm to a nearby, it's easy from a weather station to a nearby Wind Farm. They came over actually looking at is a heat mats and contour lines of where wind speed is highest in red and also where its lowest in green and really what we've done is interpolated weather conditions from a better station, which is represented here by a triangle to a nearby one front which is represented here if I Circle. And would have shown so far, really is a very static expiration of her data as well as
what goes into Data preparation, but keep in mind that our end goal is to design an interactive dashboard for both solar and wind farm planners in odors. So this entails being able to manipulate and visualize the site. Meditative, the weather information over both location and time a different angle. For solar are made by a variable of interest is called Global horizontal or Radiance or ghi. So, if I were in the last year of 2020, I can see that it's actually highest in the west as well as the Panhandle
to collectively, this makes up the northwest of Texas. Just to give you an idea of like a different facets of the data. If I were to select you, regen the coast in panhandle, which are on opposite sides of Texas. I can see that ghi tends to be a higher in the Panhandle in the coast. I can also see that on average, there is more capacity and investment on the Panhandle sides and the coast and altogether. It looks like a lot of these sites. So happen to be an earlier stage of completion. So they're either too serious and interest or projects that are plant. And this again is showing the
different facets of our data that we want to be that we want to explore holistically. And likewise, we have very much a very similar, a similar page for went. So of course our when variables are different and we also want to be very intentional about what we're visualizing on a map. So as I mentioned, for the wind data, you know, we have to pull weather conditions from nearby weather station, and we wanted to see those adjacent to the actual Farm sites. And so far, what I have really been showing is an emphasis on the geospatial aspect of our data. And another thing that
we were interested in is the time-series data at hand. So solar farm, solar and wind farm planners and owners must be able to track current weather. Condition is also important for them to be able to compare weather conditions to, to hear what we're actually looking at is solar ghi and we're comparing it in the year of 2020 across different month. And you might ask like why is this important for any type of primer? Well, it shows them, for example, that jam pichai is more to be more consistent in the month of April than it was in the month of May, and likewise Farm owners know
that summer is when solar production is highest. So if we were to so it's important for them to be able to compare maybe like a certain Seasons output against previous years. So it looks like solar ghi was highest in the year of 2020 compared to the previous. Is your spot a little bit and we want any data science models. We could strap to be able to pick up on long-term changes in weather. And as well as he's no differences in weather. And over all this time series after Deborah data just adds another layer of complexity to the existing like variety of data that we working with as
well as official later. And so far would have really shown is the historical data that we have to keep in mind that for future prediction or for planning. We do need to make future predictions. And this is really where we brought in typical data science. In order to test a few different statistical. In machine learning models to predict the future outcome of the weather variable ghi. From a combination of other more readily available weather report for Ashley, testing three different types of models a linear regression, a random forest and gradient boosting regression.
Using to Cody to sign. I can easily compare these models against one another. So it looks like the linear regression in the rainforest actually performed best with lowest mean absolute errors and also the lowest root mean Square. If I were to drive more deeply into the rainforest, I can also see the variable importance to it. Looks like sores that angle and temperature are the most important variables in the model for the model to learn again for predicting GHI. Luckily, I thought it was also interesting to mention that we used to go data science in
order to aggregate all of our data. So this goes back to the volume challenge that I mentioned at the beginning or solar dataset alone, pulled him over five different sources, and we need to quickly scale and process millions of rows of data before we could even visualize it. So again, this is another step of preparation that goes into the pole pipeline. And now that we have suitable models, it's important for us to be able to put them into contact with our insides, as well as existing data. So here's what we're looking at, is both a solar and wind
data. We're looking at one solar variable in one variable at one winds variable at a time who are fixed on solar ghi and when power. So, the triangles here, represents a website and the circles represent a solar site and the sizes of the variables, actually indicate a higher value of the variable and the colors. Also indicate something about the flight status. So really when you start looking at this map, what we're seeing, what were drawn to it and actually the areas where the size of the points are larger again. What we know to be the northwest of Texas and
if I were to scroll down to the bottom of this page, we can also see the time series of the variables that I just mention as well as any variables that we predicted or projected appear in a different color. So, there's an integration between Tipo de to size and Spotfire that allows me to consider Refresh or remodel these results so I can see them as needed. And altogether, this was really the map in subsequent filters. Really allow us to manipulate the spatio-temporal aspect of our data. So keep in mind that spot fires map layers as well as a match. Her a picture later, allow it to
cleanly and complexity of you many different spatial aspects of our data. Awesome sweater. Awesome job. Thank you so much for the presentation. It was insightful and it was also really beautiful and I love date of innovation. So this is a treat for me. Thank you so much for walking through all that. We got plenty of comments and questions to go through here. Will try to take one or two questions of Commons to telling you that people are loving the presentation. Thank You song for joining us. Have a question here from Kimberly and she's asking as power moves towards a
balanced integrated grid. How important will I Communications infrastructure and cybersecurity be for creating an effective and efficient. Grid, what are some growing Trend that you see in the space for data professionals? Yeah, I think that's just another layer of complexity that's going to be added to our data. So it's not in a demo right now, but you can imagine that we need to be working with other types of organizations and partners to bring in a cybersecurity data to Addison to add a layer of safety. And that, you know, that's important for putting out a full project in itself. And
it also adds complexity in the state of science challenge. Thank you for that. Seriously. A lot of so many comments about how great of a devil doesn't hear. From Michelle is asking, are there any solar wind farms that incorporates both? I think you were showing that the triangles and actually that had both triangles and circles overlapping. So you can clearly see very clearly off the bat where there's hire some ghi, as well as higher Twin Tower. Great. Thank you so much, and we'll take one more
question here from Atlanta. And she's asking, how much emphasis are you putting into cyber security? If I guess in the states, so, once again, I guess that's an additional component of data that we can always bring in and it did it right now is also protected and propriety. So that's something that we also keep in mind. When were handling it as a designs and also put, thank you so much. I paid, before we wrap up. I just wanted to make another announcement on on Cisco. We have a course that's coming out in a dedicated Academy's call David dashboard with tibco Spotfire. Just wanted to do a
quick chat on here because he's cut doctors. Dr. Spotfire himself. Putting together that Forest Hill canoe go. So go ahead and check that out. You get an email if you register for the conference which would more information on that. But thank you so much for for the presentation and thank you for taking her question. Yep. Thank you so much for having me. Okay. Have a good rest of your day.
Buy this talk
Buy this video
Our other topics
With ConferenceCast.tv, you get access to our library of the world's best conference talks.