雪花:简化机器学习

简化机器学习

引言:雪花的策略

雪花's mission revolves around making data accessible, usable, and valuable to everyone. One of the pillars that have made 雪花 an industry leader is its unwavering commitment to being easy to use and turnkey. A testament to this is a quote from 雪花's CEO, Frank Slootman, from the Q2 '23 Earnings call:

Where you see huge differences is in the total cost of ownership, 这还不包括计算和存储的成本. In other words, what is the cost to run that technology? This is where 雪花 has a huge advantage, and our customers know that. 只是减少了技能, 更少的人, and not having to touch the complexity of the underlying platforms. We’re more descendants of the Apple and Tesla then being the descendants of Hadoop, 就像有些人在市场上一样, 正确的? 所以mg官方游戏中心把复杂性抽象出来了. 这就是产生TCO优势的原因. But the raw cost of computing and storage, there’s not that much opportunity to be had there.

以这一战略为基础, 雪花 continues to innovate and streamline even the most complex of tasks. 展望未来, we can expect the following items to be shaped by 雪花's signature user-friendly approach:

容器:为应用程序提供隔离的环境.

  • 低管理:减少运营开销.

非组织性数据:简化对多种数据的管理.

  • DocumentAI高级文档处理和见解.

机器学习:用户友好的ML工具和功能.

  • ML SQL函数:在SQL中嵌入ML功能.

AI

  • 带有NVDA的AI:尖端人工智能工具的协作.

  • 微软: 合作伙伴hip to bring 微软AI directly to the Data Cloud

  • 法学硕士的公司数据:扩大数据覆盖范围和效用.

数据应用使以数据为中心的应用程序更易于访问.

  • Streamlit

  • 本机应用程序框架:应用开发的无缝集成.

机器学习SQL函数在开放预览

There are many reasons why 可用性 testing matters and is so important for any project or application. UX designers are often reminded of the phrase “You are not the user”. 这是一个需要记住的重要短语, because even if a design makes perfect sense to the designer, that does not mean that the user will have a good experience with it. 这也是用户体验设计的全部目标, making sure that your application provides the best user experience to the widest range of users. 这对于早期发现问题也很重要. Think of someone proofreading a paper you wrote and catching a grammar mistake, 即使你没有注意到. 可用性测试也是如此. 经常, we are too close to the design and a fresh pair of eyes can help us find things we would have never noticed.

Designing a Usability Test

雪花's ML SQL函数 currently in open preview are transforming the way we view SQL and ML. 这三个先行者是:

1.     预测:根据过去的数据预测未来的值. 理想的销售预测,股票趋势,和更多.

2.     异常检测: Identify unusual patterns in data that don't conform to expected behaviors. Useful in fraud detection, system health monitoring, etc.

3.     贡献的探险家: Understand contributing factors to a particular outcome. 这就像对每个“什么”都要问一个“为什么”.

需求 & 限制

As with any tool in development, there are requirements and limitations. 以下是这些函数的当前约束:

  • 最多500,000行用于模型训练.

  • 至少12行用于模型训练.

  • 1秒最小粒度.

  • Seasonal components have a 1-minute minimum granularity.

  • 时间戳必须具有固定的间隔.

  • Season length of autoregressive features tied to input frequency.

  • Existing models cannot be updated; a new one must be trained.

  • 异常值会影响算法. 如果不需要,用户可能需要删除.

  • 不可能跨帐户克隆模型.

开始学习ML SQL函数

Diving into these functions involves a systematic process:

  1. 准备数据整理和清理您的数据,以确保其准备就绪.

    • 最重要的一步

  2. 创建模型: Set up the foundation for your 机器学习 model.

  3. 火车模型:使用你的数据来训练和完善模型.

  4. 获取数据:提取见解和结果.

例子:

I have a dataset with the closing price data for all the stocks in the Nasdaq & 陶氏. I want to run predictive analysis over the dataset for the next 2 months. I want to train the model on data beginning on 1/1/2019.

准备数据

在这一步中,视图是您的朋友. 这是进一步为ML准备数据的地方. 做事要符合要求. For this stock dataset, there are a few things to handle:

  • 1.     There are tickers with less than 12 rows (new IPO or stock that came off the market within 12 days of the beginning).  

    • 通过视图排除这些记录

  • 2.     There is a date column but I need this to be a timestamp data type

    • 将数据类型更改为视图中的时间戳

  • 3.     周末和节假日数据不存在. Need to meet the FIXED intervals by mocking up data for those dates.

    • Have missing data show as previous close price thru a view

  • 4.     When training on larger sets, its important he final view be ordered by the TIMESTAMP column

创建模型

现在艰苦的工作已经完成了. mg官方游戏中心创建模型.

火车模型

对模型进行60个预测周期的训练. This step can take a long time but upping the warehouse can reduce that time.

获取数据

如果使用直接SQL, use the RESULT_SCAN function to put the results from the previous step into a table for further analysis.

结论

雪花 continues to shape the future of data analysis and 机器学习 by introducing powerful yet user-friendly tools. mg官方游戏中心期待着进一步的创新和改进, 很明显,雪花, 机器学习真的适合每个人.

Dive in, explore, and harness the power of data like never before!

安迪Wickman

Seasoned technology leader with over 20 years of experience in the IT industry, has consistently demonstrated success in various leadership roles. With a strong background in databases and a proven track record of delivering projects on time, Andy has a keen ability to identify and execute corporate strategic goals.

A forward-thinking innovator known for strong problem-solving skills and unwavering work ethic, enable him to effectively manage multiple complex projects for a diverse range of customers. Extensive experience in the IT domain allows him to provide valuable insights and share his technical expertise with clients and senior management alike.

安迪“训练”员工的能力, coupled with his strong business acumen and technical vision, have contributed to his success in improving processes across the board. His effective people skills have also made him a sought-after leader and collaborator in the industry.

在他的技术博客文章中, 安迪分享他丰富的知识和经验, providing readers with valuable insights into the rapidly evolving world of technology and database management. As a self-motivated professional who requires minimal supervision, Andy continues to pave the way for innovation and progress in the IT sector.

以前的
以前的

用雪花和流光创建一个Web应用程序

下一个
下一个

人工智能能接管数据和分析吗?