![Soumil Shah](/img/default-banner.jpg)
- Видео 1 730
- Просмотров 6 678 780
Soumil Shah
США
Добавлен 16 ноя 2012
I’m Soumil Nitin Shah, a Lead Data Engineer and Apache Hudi expert with extensive experience in AWS Big Data and data lakes. I specialize in designing scalable data ingestion frameworks, managing over 2TB of data monthly, and optimizing workflows with my innovative "LakeBoost" framework, which integrates Apache Hudi with AWS Glue ETL to enhance efficiency and reduce costs. Additionally, I am a committed content creator with a RUclips channel boasting 42,000 subscribers and over 1,000 videos on big data technologies. My work reflects a strong dedication to advancing the field of data engineering through both technical innovation and educational outreach.
How to use Kafka Connect S3SinkConnector with Minio | Dump Data from kafka topics to Minio Buckets
How to use Kafka Connect S3SinkConnector with Minio | Dump Data from kafka topics to Minio Buckets
Exercise Labs
github.com/soumilshah1995/Kafka-sink-minio/blob/main/README.md
MinIO
Exercise Labs
github.com/soumilshah1995/Kafka-sink-minio/blob/main/README.md
MinIO
Просмотров: 73
Видео
Use S3 as an External Volume in Snowflake along with X table to interoperate as Hudi|Iceberg|Delta
Просмотров 7112 часов назад
How to Use AWS S3 as an External Volume in Snowflake along with X table to interoperate as Hudi | Iceberg | Delta and query the data in Athena In this video, I’m diving deep into how you can interoperate with Snowflake tables and extend their capabilities by leveraging AWS services. Watch part 1: ruclips.net/video/FZsO5qeXPfM/видео.html Lab : github.com/soumilshah1995/snowflake-xtable 🔍 Key Hig...
How to Use AWS S3 as an External Volume in Snowflake | Hands on guide
Просмотров 6314 часов назад
How to Use AWS S3 as an External Volume in Snowflake Are you looking to integrate AWS S3 with Snowflake and streamline your data management process? Check out my latest video where I walk you through the entire setup: 🔹 Creating IAM Roles and Policies: Learn how to configure AWS IAM roles and policies to grant Snowflake access to your S3 buckets. 🔹 Setting Up Snowflake: Discover how to create a...
Personal Opinion: Which Table Format Do I Prefer? (Hudi, Iceberg, or Delta)
Просмотров 7716 часов назад
Disclaimer: All table formats are excellent, and this video reflects my personal choice. I am not paid by any company or vendor to favor any format. These are my personal views, and it's perfectly okay if your preference differs. I've used all three formats, and I genuinely appreciate each one.
Insert | Update| Delete | TimeTravel|Schema Evolution|with Iceberg and Minio Requested by Saurabh
Просмотров 7416 часов назад
Insert | Update| Delete | TimeTravel with Iceberg Tables and Minio Requested by Saurabh Exercise Notebook github.com/soumilshah1995/code-snippets/blob/main/iceberg-soumil.ipynb Github project to clone and try github.com/tabular-io/docker-spark-iceberg.git To seamlessly interoperate with other data formats, I recommend using X Table. Learn more about it here: X Table xtable.apache.org Feel free ...
Using Bucket Index & Right Partitioning with Hudi for 660GB Tables & 4.4B Records on AWS Glue 4.0
Просмотров 8019 часов назад
Using Bucket Index & Right Partitioning with Hudi for 660GB Tables & 4.4B Records on AWS Glue 4.0 Resources to read www.linkedin.com/pulse/how-we-used-bucket-index-apache-hudi-right-660gb-tables-soumil-shah-gkgke/?trackingId=I1uqQzduSae9LBiGl3ll1g www.linkedin.com/pulse/apache-hudi-accelerating-upsert-simple-index-choosing-soumil-shah/?lipi=urn:li:page:d_flagship3_pulse_read;upOgTzuUTzS2VGegusm...
What are some of the common Interview Questions for Apache Hudi
Просмотров 65День назад
Ever wondered what kind of questions you might face in an interview where Apache Hudi is listed in the job description? Look no further! 🎯 In my latest RUclips video, "What are some of the Common Interview Questions for Apache Hudi," I dive deep into popular questions you can expect during the interview process. From basic concepts to advanced scenarios, I cover it all to help you ace your inte...
Learn How to Run the Apache X Table in Docker Environments with Rocky Linux (Hudi| IceBerg|Delta)
Просмотров 51День назад
Blog: www.linkedin.com/pulse/learn-how-run-apache-x-table-sync-command-docker-rocky-soumil-shah-ypkye/?trackingId=y2kj7CxyTYGgU+MyFANcsA Exercise Files github.com/soumilshah1995/apache-x-table-docker-tutorial/blob/main/README.md
Understanding Apache Hudi's MERGE INTO Command with Minio and HiveMetaStore
Просмотров 4814 дней назад
Understanding Apache Hudi's MERGE INTO Command with Minio and HiveMetaStore Blog : www.linkedin.com/pulse/understanding-apache-hudis-merge-command-minio-soumil-shah-ejapf/?trackingId=W2uo8swbTDazhxx3hYqBHg Exercises Files github.com/soumilshah1995/hudi-mergeinto-labs/blob/main/README.md
Apache Hudi 1.0.0 leverages LSM trees to achieve faster writes and save storage.
Просмотров 10814 дней назад
Apache Hudi 1.0.0 leverages LSM trees to achieve faster writes and save storage. In this video, I delve into how LSM trees work within Hudi, showcasing their benefits for efficient data management. For beginners curious about LSM trees, I've included an excellent explanatory video from the RUclips channel "ByteByteGo," where the concept is very well explained. As data writes exceed a certain th...
My Journey into Apache Hudi: How and When I Started Learning and Tips for Beginners
Просмотров 10714 дней назад
My Journey into Apache Hudi: How and When I Started Learning and Tips for Beginners
Hudi JAR Compilation: Build & Compile Hudi JARs for Specific Spark Versions Using Docker Containers
Просмотров 2221 день назад
Master Hudi JAR Compilation: Building and Compiling Hudi JAR Files for Specific Spark Versions Using Docker Containers Steps github.com/soumilshah1995/Hudi-compile-jar-docker/tree/main References medium.com/@life-is-short-so-enjoy-it/hudi-build-your-jars-with-your-patches-rocky-linux-in-docker-9010b275321a Apache Hudi
Unlock Insights: A Step-by-Step Guide to LakeView Free Community Edition with AWS Glue
Просмотров 6321 день назад
Unlock Deep Insights from Data Lakes: A Step-by-Step Guide to Using LakeView Free Community Edition with AWS Glue Step By Step Guide github.com/soumilshah1995/Hudi-lake-view/blob/main/README.md References www.onehouse.ai/blog/announcing-lakeview github.com/onehouseinc/LakeView/tree/main
Milestone Achieved: 42,000 Subscribers! Thank You, Everyone!
Просмотров 7021 день назад
Milestone Achieved: 42,000 Subscribers! Thank You, Everyone!
Fast GeoSearch on Data Lakes: Efficiently Build Geo Search Using Hudi for Lightning-Fast Retrieval
Просмотров 7121 день назад
Fast GeoSearch on Data Lakes: Efficiently Build Geo Search Using Hudi for Lightning-Fast Retrieval
Building Keyword Search in Hudi: Inverted Indexes, Record Level | Keyword Search in Datalakes
Просмотров 7628 дней назад
Building Keyword Search in Hudi: Inverted Indexes, Record Level | Keyword Search in Datalakes
Storing Athena Query Metrics in Hudi for Advanced Analysis and Audit using AWS Glue
Просмотров 65Месяц назад
Storing Athena Query Metrics in Hudi for Advanced Analysis and Audit using AWS Glue
Using OpenAI Vector Embedding to Store Large Vectors in Hudi with MiniO for Cost-Effective AI Apps
Просмотров 108Месяц назад
Using OpenAI Vector Embedding to Store Large Vectors in Hudi with MiniO for Cost-Effective AI Apps
Learn How to Use Apache Hudi Streamer with DataHUB An Open Source Metadata Platform
Просмотров 67Месяц назад
Learn How to Use Apache Hudi Streamer with DataHUB An Open Source Metadata Platform
Getting Started with X-Table and Unity Catalog | Universal Datalakes | Hands on Labs
Просмотров 143Месяц назад
Getting Started with X-Table and Unity Catalog | Universal Datalakes | Hands on Labs
Hudi Using Spark SQL on AWS S3: Insert, Update, Deletes, Stored Procedures on AWS Glue Notebooks
Просмотров 85Месяц назад
Hudi Using Spark SQL on AWS S3: Insert, Update, Deletes, Stored Procedures on AWS Glue Notebooks
How to Use Hudi Streamer on New EMR 7.1.0 Spark 3.5.1 and Hudi 0.14.1 | Hands-on Labs
Просмотров 89Месяц назад
How to Use Hudi Streamer on New EMR 7.1.0 Spark 3.5.1 and Hudi 0.14.1 | Hands-on Labs
How to Use Hudi Streamer with Hudi version 0.15.0 | Hands on Guide |
Просмотров 64Месяц назад
How to Use Hudi Streamer with Hudi version 0.15.0 | Hands on Guide |
How to Execute Postgres Stored procedures in Spark | Hands on Guide
Просмотров 104Месяц назад
How to Execute Postgres Stored procedures in Spark | Hands on Guide
Learn How to Ingest Data from Hudi Incrementally hudi table changes into Postgres Using Spark
Просмотров 101Месяц назад
Learn How to Ingest Data from Hudi Incrementally hudi table changes into Postgres Using Spark
Universal Datalakes: Interoperability with Hudi, Iceberg, and Delta Tables with AWS Glue Notebooks
Просмотров 125Месяц назад
Universal Datalakes: Interoperability with Hudi, Iceberg, and Delta Tables with AWS Glue Notebooks
4 Different Ways to fetch Apache Hudi Commit time in Python and PySpark
Просмотров 73Месяц назад
4 Different Ways to fetch Apache Hudi Commit time in Python and PySpark
OneTable to translate a Hudi table to Iceberg format and sync with Glue Catalog
Просмотров 75Месяц назад
OneTable to translate a Hudi table to Iceberg format and sync with Glue Catalog
Learn How to Run Apache X Table Sync Command on AWS Cloud Shell | Interoperate Hudi Iceberg delta
Просмотров 39Месяц назад
Learn How to Run Apache X Table Sync Command on AWS Cloud Shell | Interoperate Hudi Iceberg delta
Learn How to Ingest XML files with AWS Glue into Hudi Datalakes | Step by Step guide
Просмотров 130Месяц назад
Learn How to Ingest XML files with AWS Glue into Hudi Datalakes | Step by Step guide
Thanks Soumil, because of you only i have got some confidence in REST API
Glad to hear buddy
Hi bro, can you provide the same configuration for kubernetes? that'll help a ton
That’s a headache I tried with Kubernetes man it was pure headache
Can the snowflake managed Iceberg table be queried through Athena?I do the opposite.
Already made a video on that )
Use S3 as an External Volume in Snowflake along with X table to interoperate as Hudi|Iceberg|Delta ruclips.net/video/Hi5hQ6BVrWg/видео.html
Dude , thank you for this , I was trying this from long time with s3 sink , I have to try this , I was able to connect with Debezium connector from Postgres to Kafka to get cdc data , I could not get the s3 sink connector
Those videos are coming too in pipeline
Thx! Waiting the same sink with hudi/iceberg.
You're a lifesaver, man
Very concise and clear. Thanks for this but I tried inserting an array of json objects in the create item json UI but keep getting funny syntax errors. Any help?🙏
Well, that was not a surprise…
What happens when you change column type?
this may be a good way to interact with lmstudio
Nice tutorial, man. I would like to know how do the "service name" generates random names.
Hey Soumil, Will it be possible for you to make a series on AWS with pyspark. An end to end project with detailed explanation of AWS services used would be much appreciated. I'm looking for a real time project on this. THANKYOU!!
Sure thing
Man, You are awesome
Thank you sir
@soumilShah , Can't we directly use the Glue job to read the data from configured S3 bucket path . What is the SQS queues and Glue polling SQS ?
Man you are a genius. Essentially
Thank you sir
docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3
Can you help scrape a website
Thank you for making video on our blog @soumil - Navnit Shukla
You guys did great job
Great let me know you need any other videos happy to do so
Hey Soumil , definitely a good head start for someone starting out on Iceberg. Will checkout the XTable too. Thanks for the video.
Hello, I'm trying to do something similar using Nvidia jetson devices. Hosting an OpenVPN AS on AWS , so that the client can remotely connect to the server regardless of the location. The client and server are setup , although when I try to send a simple python script using sockets which just has a hello world message, the server becomes unresponsive and I'm unable to receive the message. Any help is appreciated.
Awesome job. You are using "Essentially" very frequently.
Essentially you are right
Great stuff, hopefully I could recollect it during the interviews ^_^ Thanks for sharing!
Hi can you share ideas on how we can optimize cost for aws transfer family
Vel come to duh wideo
Please tell me the prerequisite of ms in cs
Many Thanks Soumil. This is what exactly I was looking for. We have Kinesis data stream and we are exporting to S3 through firehose. The firehose exports data in yyyy/MM/dd/HH format. We needed to write batch-ETL in glue to transform the data and store this in different S3 location. The new location should have the same format, yyyy/MM/dd/HH. This code really helped in solving this issue.
Thank you sir
Hi Soumil, we have complex hierarchy and design requirement. Is it possible for you to connect with us to discuss the many to many dynamoDB design
where's the dataset please provide
Hi Soumil, please do these kind of videos it will be more helpful.
Yes sir
Amazing Questions with real-time scenarios. Please make such videos on other AWS services.
thank you for sharing knowledge
explained well !!!!!!!!!
Thx for video Soumil! Waiting for video about integration StarRocks + Hudi/Iceberg
Here is video ruclips.net/video/_DcOp2YP774/видео.htmlsi=xLrqjd_YmguS5k2Y
Congratulations