Duration 26:42
16+
Play
Video

Koalas: Easy Transition from Pandas to Spark - Ben Sadeghi

Ben Sadeghi
Partner Solutions Architect at Databricks
  • Video
  • Video
FOSSASIA Summit 2020
March 20, 2020, Online, Online
FOSSASIA Summit 2020
Video
Koalas: Easy Transition from Pandas to Spark - Ben Sadeghi
Available
In cart
Free
Free
Free
Free
Free
Free
Add to favorites
35
I like 0
I dislike 0
Available
In cart
Free
Free
Free
Free
Free
Free
  • Description
  • Discussion

About speaker

Ben Sadeghi
Partner Solutions Architect at Databricks

Ben Sadeghi is a Partner Solutions Architect at Databricks, covering Asia Pacific and Japan, focusing on Microsoft and its partner ecosystem. Having spent several years with Microsoft as a Big Data & Advanced Analytics Technology Specialist, he has helped various companies and partners implement cloud-based, data-driven, machine learning solutions on the Azure platform.Prior to Databricks and Microsoft, Ben was engaged as a data scientist with Hadoop/Spark distributor MapR Technologies (APAC), developed internal and external data products at Wego, a travel meta-search site, and worked in the Internet of Things domain at Jawbone, where he implemented analytics and predictive applications for the UP Band physical activity monitor. Before moving to the private sector, Ben contributed to several NASA and JAXA space missions.Ben is an active member of the open-source Julia language community. He holds an M.Sc. in computational physics, with an astrophysics emphasis.

View the profile

About the talk

Track: Python

Pandas, the de-facto standard DataFrame implementation in Python, is very popular among data scientists, but it does not scale well to big data. It was designed for small data sets that a single machine could handle. On the other hand, Apache Spark has emerged as the de-facto standard for big data workloads. Today many data scientists use Pandas for coursework, pet projects, and small data tasks, but when they work with very large data sets, they either have to migrate to PySpark to leverage Spark or downsample their data so that they can use pandas.

Now with Koalas, an open-source implementation of the Pandas API on Apache Spark, data scientists can make the transition from a single machine to a distributed environment without needing to learn a new framework. In this talk, we'll go through the basics of Koalas, along with demos.

Share

Cackle comments for the website

Buy this talk

Access to the talk “Koalas: Easy Transition from Pandas to Spark - Ben Sadeghi”
Available
In cart
Free
Free
Free
Free
Free
Free

Access to all the recordings of the event

Get access to all videos “FOSSASIA Summit 2020”
Available
In cart
Free
Free
Free
Free
Free
Free
Ticket

Interested in topic “AI and Machine learning”?

You might be interested in videos from this event

March 11, 2020
Sunnyvale
30
197.46 K
dev, google, js, machine learning, ml, scaling, software , tensorflow, web

Buy this video

Video

Access to the talk “Koalas: Easy Transition from Pandas to Spark - Ben Sadeghi”
Available
In cart
Free
Free
Free
Free
Free
Free

Conference Cast

With ConferenceCast.tv, you get access to our library of the world's best conference talks.

Conference Cast
551 conferences
21655 speakers
8015 hours of content